Loading US census data

Introduction

The American Community Survey (ACS) is different from the decennial Census, it asked more questions to provide information about education, housing, jobs and more. The ACS is conducted each month in order to provide current information to communities every year the decennial census is conducted every 10 years and ask a shorter set of questions to provide an official count of the entire US population.

The ACS is especially important local communities depend on information from this survey to decide where schools, highways, hospitals, and other important services are needed.
The 5-year estimates from the ACS are "period" estimates that represent data collected over a period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups, and unlike the 1-year estimates, geographies do not have to meet a particular population threshold in order to be published.

The web api call
https://api.census.gov/data/2018/acs/acs5?get=variables&for=county:*&in=state:*

The complete list of available variables can be found here:

https://api.census.gov/data/2018/acs/acs5/variables.html

Loading the data

To begin with the data load, Open Power BI desktop and select Get Data

Use the filter in the left side to search and select Python script, then click Connect.

Add this code to the script form

import requests

import pandas as pd

HOST = "https://api.census.gov/data"

years = ["2010","2011","2012","2013","2014","2015","2016","2017","2018"]

dataset = "acs/acs5"

get_vars = ["NAME","B01001_001E","B01001_002E","B01001_026E",

"B01001A_001E","B01001B_001E","B01001C_001E",

"B01001D_001E","B01001E_001E","B01001F_001E","B01001G_001E"]

predicates = {}

predicates["get"] = ",".join(get_vars)

predicates["for"] = "county:*"

predicates["in"] = "state:*"

col_names = ["county_name","total_pop","total_male","total_female","tot_white","tot_black", "tot_ind", "tot_asian", "tot_haw", "tot_other", "tot_mult","state_id","county_id"]

df = pd.DataFrame(columns=["county_name","total_pop","total_male","total_female",

"tot_white", "tot_black", "tot_ind", "tot_asian","tot_haw", "tot_other", "tot_mult",

"state_id","county_id","census_year"])

for year in years:

base_url = "/".join([HOST, year, dataset])

r = requests.get(base_url, params=predicates)

ydf = pd.DataFrame(columns=col_names, data=r.json()[1:])

ydf["census_year"] = year

df = df.append(ydf)

then select OK

The script will execute, and the Navigator list and preview the available data frames.

Select df and then Transform Data.

Rename the table to US Population ACS5 and select Close & Apply.

Now repeat the process to load the States table using this python code.

import requests

import pandas as pd

HOST = "https://api.census.gov/data"

year = "2018"

dataset = "acs/acs5"

predicates = {}

get_vars = ["NAME"]

predicates["get"] = ",".join(get_vars)

predicates["for"] = "state:*"

col_names = ["state_name","state_id"]

base_url = "/".join([HOST, year, dataset])

r = requests.get(base_url, params=predicates)

states = pd.DataFrame(columns=col_names, data=r.json()[1:])

Back in Power BI desktop, change the summarization method from Sum to Don’t Summarize to county_id, census_year and state_id

Change state_id summarization method from Sum to Don’t Summarize.

Change the state_name column category to State or Province.

The model automatically will link both tables by state_id

Now with the census data loaded, we can start adding new columns and measures into the model.

SQLBIDA

Search This Blog