Skip to main content

Loading US census data

Introduction

The American Community Survey (ACS) is different from the decennial Census, it asked more questions to provide information about education, housing, jobs and more. The ACS is conducted each month in order to provide current information to communities every year the decennial census is conducted every 10 years and ask a shorter set of questions to provide an official count of the entire US population.

The ACS is especially important local communities depend on information from this survey to decide where schools, highways, hospitals, and other important services are needed.
The 5-year estimates from the ACS are "period" estimates that represent data collected over a period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups, and unlike the 1-year estimates, geographies do not have to meet a particular population threshold in order to be published.

The web api call
https://api.census.gov/data/2018/acs/acs5?get=variables&for=county:*&in=state:*

The complete list of available variables can be found here: 


Loading the data

To begin with the data load, Open Power BI desktop and select Get Data


Use the filter in the left side to search and select Python script, then click Connect.


Add this code to the script form

import requests
import pandas as pd

HOST = "https://api.census.gov/data"
years = ["2010","2011","2012","2013","2014","2015","2016","2017","2018"]
dataset = "acs/acs5"

get_vars = ["NAME","B01001_001E","B01001_002E","B01001_026E",
"B01001A_001E","B01001B_001E","B01001C_001E",
"B01001D_001E","B01001E_001E","B01001F_001E","B01001G_001E"]

predicates = {}
predicates["get"] = ",".join(get_vars)
predicates["for"] = "county:*"
predicates["in"] = "state:*"

col_names = ["county_name","total_pop","total_male","total_female","tot_white","tot_black", "tot_ind", "tot_asian", "tot_haw", "tot_other", "tot_mult","state_id","county_id"]

df = pd.DataFrame(columns=["county_name","total_pop","total_male","total_female",
                           "tot_white", "tot_black", "tot_ind", "tot_asian","tot_haw", "tot_other", "tot_mult",
                           "state_id","county_id","census_year"])

for year in years:
    base_url = "/".join([HOST, year, dataset])
    r = requests.get(base_url, params=predicates)
    ydf = pd.DataFrame(columns=col_names, data=r.json()[1:])
    ydf["census_year"] = year
    df = df.append(ydf)



then select OK

The script will execute, and the Navigator list and preview the available data frames.

Select df and then Transform Data.


Rename the table to US Population ACS5 and select Close & Apply.





Now repeat the process to load the States table using this python code.

import requests
import pandas as pd

HOST = "https://api.census.gov/data"
year = "2018"
dataset = "acs/acs5"

predicates = {}

get_vars = ["NAME"]

predicates["get"] = ",".join(get_vars)
predicates["for"] = "state:*"

col_names = ["state_name","state_id"]

base_url = "/".join([HOST, year, dataset])
r = requests.get(base_url, params=predicates)
states = pd.DataFrame(columns=col_names, data=r.json()[1:])


Back in Power BI desktop, change the summarization method from Sum to Don’t Summarize to county_id, census_year and state_id




Change state_id summarization method from Sum to Don’t Summarize.
Change the  state_name column category to State or Province.




The model automatically will link both tables by state_id



Now with the census data loaded, we can start adding new columns and measures into the model.

Comments

Popular posts from this blog

Building the dashboard (Part 1)

To show how to create the dashboard I divided the canvas in 3 sections I going to dedicate one post for every section, and the another 2 posts for the tooltip and drill through   Adding Census Year slicer First we are going to add the census year filter. This filter will affect all visual in our dashboard. Select the slicer visual and the census_year field from the US Population ACS5 table Before diving into specifics formats, disable/enable the following formatting options to match the bellow image. Now in the selection control section change to Single select   Change the Text size, Alignment, and Color for Items,  Title and Border sections The final result Adding the Cards Select the Card visual and drag and drop total_population measure into the Fields box Rename the Fields from total_population → Total Population The formatting options are: In the Data label section, change...

Building the dashboard (part 3)

Adding the top 10 counties chart We are going to add to the Census Dashboard a bar chart to show the top 10 most populated counties, also if you select States in the Map/Table, the chart need to update properly, for example if you select California and Texas in the Map, the chart needs to show the top 10 counties combining only those two States.  First we add a clustered bar chart, using County name and total_population for the Axis and Values respectively. then modify the County Filter to only include the top 10 counties by total population Now we need to change how the Map/Table interaction filters the bar chart data. Select the Map, and then Format/Edit Interaction   in the ribbon. Change the bar chart interaction type from Highlight   to Filter    Now switch to the Table visual and repeat the process Select the bar chart change the Title Text to the calculated measure chart_header.   select the measure The next post will be dedicated to adding the Drill...

Modifying the model

Adding new measures. This is a simple dashboard, just one filter (census survey year) and  few calculated measures. The measures are: Total Population Previous Year Population Change Percent Population Change Percent of US Population also, there are two calculated measures to format and add additional information to the dashboard. Value color (Green if current year population is greater than previous year population otherwise Red) Most populated counties chart header (to show selected states) To add a new measure, right-click on the US Population ACS5 table and select New Measure the add the formula for total_population and select the thousands separator format Repeat this process to add prev_census_tot_pop = VAR   PreviousCensusYear  =      MAX  (  'US Population ACS5'[census_year]  )  -  1 RETURN      CALCULATE  (  ...