tidycensus
tidycensus copied to clipboard
Values of "1939-" and "2014+" converted to 0, 18 but looks like a census (not tidycensus) issue
@walkerke initially we thought this was an issue with tidycensus, but it looks like it's related to the census API. But I think it's worth letting you know about.
We were interested in median household age and used this tidycensus code:
dat <- get_acs(
geography = "tract",
state = "TX",
table = "B25035",
survey = "acs5",
year = 2020,
geometry = TRUE)
We noticed that some median ages were "0" and "18". Rather than the expected values which are a four-digit year.
So we downloaded the data as a CSV directly from the Census here and noticed that the "0" and "18" seemed to correspond to the character values of "1939-" and "2014+".
We assume this was tidycensus but then constructed the API call manually and when you hit the API with this call you also see the 0 and 18 values.
https://api.census.gov/data/2020/acs/acs5?get=B25035_001E%2CB25035_001M%2CNAME&for=tract%3A%2A&in=state%3A48
Thanks @zross for the heads up! I'll think about how to handle this; I see a couple directions. One would be to convert to NA
; the other would be to manually top- and bottom-code by converting to 1939 and 2014 (or whatever it is for a particular ACS year), respectively. I see arguments for and against both options (though keeping it as-is isn't an option, I think).
I can't imagine a situation where this would be on purpose -- we did some searching to see if it was intentional and came up with nothing but it's also surprising that this escaped notice.
I'm a bit mixed. It's not really your job to fix Census-related issues right? And then you're on the hook for these changes. I wonder if just adding a warning for users when they pull that variable might be most appropriate? I don't know what is best here.