Introduction
The American Community Survey (ACS) is different from the decennial Census, it asked more questions to provide information about education, housing, jobs and more. The ACS is conducted each month in order to provide current information to communities every year the decennial census is conducted every 10 years and ask a shorter set of questions to provide an official count of the entire US population.The ACS is especially important local communities depend on information from this survey to decide where schools, highways, hospitals, and other important services are needed.
The 5-year estimates from the ACS are "period" estimates that represent data collected over a period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups, and unlike the 1-year estimates, geographies do not have to meet a particular population threshold in order to be published.
The web api call
https://api.census.gov/data/2018/acs/acs5?get=variables&for=county:*&in=state:*
The complete list of available variables can be found here:
Loading the data
To begin with the data load, Open Power BI desktop and select Get Data
Use the filter in the left side to search and select Python script,
then click Connect.
import requests
import pandas as pd
HOST =
"https://api.census.gov/data"
years =
["2010","2011","2012","2013","2014","2015","2016","2017","2018"]
dataset =
"acs/acs5"
get_vars =
["NAME","B01001_001E","B01001_002E","B01001_026E",
"B01001A_001E","B01001B_001E","B01001C_001E",
"B01001D_001E","B01001E_001E","B01001F_001E","B01001G_001E"]
predicates = {}
predicates["get"]
= ",".join(get_vars)
predicates["for"]
= "county:*"
predicates["in"]
= "state:*"
col_names =
["county_name","total_pop","total_male","total_female","tot_white","tot_black",
"tot_ind", "tot_asian", "tot_haw",
"tot_other",
"tot_mult","state_id","county_id"]
df =
pd.DataFrame(columns=["county_name","total_pop","total_male","total_female",
"tot_white", "tot_black", "tot_ind",
"tot_asian","tot_haw", "tot_other",
"tot_mult",
"state_id","county_id","census_year"])
for year in years:
base_url = "/".join([HOST, year,
dataset])
r = requests.get(base_url, params=predicates)
ydf = pd.DataFrame(columns=col_names,
data=r.json()[1:])
ydf["census_year"] = year
df = df.append(ydf)
The script will execute, and the Navigator list and
preview the available data frames.
Select df and then Transform Data.
Rename the table to US Population ACS5 and select Close
& Apply.
Now repeat the process to load the States table using this
python code.
import requests
import pandas as pd
HOST =
"https://api.census.gov/data"
year = "2018"
dataset =
"acs/acs5"
predicates = {}
get_vars =
["NAME"]
predicates["get"]
= ",".join(get_vars)
predicates["for"]
= "state:*"
col_names =
["state_name","state_id"]
base_url =
"/".join([HOST, year, dataset])
r =
requests.get(base_url, params=predicates)
states =
pd.DataFrame(columns=col_names, data=r.json()[1:])
Back in Power BI desktop, change the summarization method from Sum to Don’t Summarize to county_id, census_year and state_id
Change state_id summarization method from Sum to Don’t
Summarize.
Change the state_name
column category to State or Province.
The model automatically will link both tables by state_id
The model automatically will link both tables by state_id








Comments
Post a Comment