censusdata
censusdata copied to clipboard
Discrepancy between API results and downloaded table from Census data.gov
I am noticing that there is a discrepancy with the data that is obtained when using this API than the data that is obtained directly from the Census data website (via their table download feature).
The datafield that I am looking at specifically is: B18108 (Disability-related data), but from the look of it, this affects other datafield as well. And I am using county-level data for ASC-1-year 2019.
The following code is used:
result = censusdata.download('acs1', 2019, censusdata.censusgeo([('county', '*')]), datafields)
censusdata.export.exportcsv('censusdata-api.csv', result)
(The datafields are all the variables/values related to B18108)
The data itself is not incorrect, but the values seem to corresponds to the wrong location/county. The first couple rows are correct, but then the subsequent rows are not. One example: the data for Los Angeles county obtained from the API matches the Napa County from the downloaded table. Rock County, WI data from API matches to the Scioto County, OH.
Can you post a full example? Here's my own that shows B18108_001E is identical to a csv export from data.census.gov at the county level. The tables are not indexed identically (and you should not assume so) but they are not inaccurate.
import pandas as pd
import censusdata as cd
datafields = ['B18108_001E']
result = cd.download('acs1', 2019, cd.censusgeo([('county', '*')]), datafields)
cd.export.exportcsv('censusdata-api.csv', result)
dfcd = pd.read_csv('censusdata-api.csv')
#dfcd.shape #(840, 4)
#Downloaded B18108 table from 2019 ACS1 via https://data.census.gov/cedsci/table?q=B18108%3A%20AGE%20BY%20NUMBER%20OF%20DISABILITIES&g=0100000US%240500000&tid=ACSDT1Y2019.B18108
dfacs = pd.read_csv('ACSDT1Y2019.B18108.csv', skiprows=[1])
dfacs = dfacs[['B18108_001E', 'NAME']]
#df_sub.shape #(840, 2)
df = dfacs.merge(dfcd, on='NAME', suffixes=['_acs', '_cd'])
df['B18108_001E_acs'].equals(df['B18108_001E_cd']) #True
@steventrev thanks for replying. Now that @jtleider has left are we keeping the data current or is there a list of bugs, features, documentation or other that need doing?
@datatalking - this package and its documentation continue to work presently. The package can support 2020 data by adding new tables to the /censusdata/variables/ path, which many forks (including my own) have done. I'm a greenhorn in this space, but will support where I can.
@steventrev are you supporting the censusdata package going forward from your repo, if so I'd like to help collaborate. I'm green to the census data package but have used the data within for years. Hopefully my python and other skills can be of use, I see this package as worth (some) maintaining.
@datatalking I doubt my capability beyond my refresh of the input files. Would a better course of action be to request the reins from @jtleider?