covid-19-data
covid-19-data copied to clipboard
Question: How are the 'hover' numbers on the county hot spot map calculated ?
Hi, Has anyone ran into an issue on this? No matter how I try, I am not able to come up with the same numbers.
Worked examples here - using the NYT county data, and census population data. There may be some small differences in census population data, but not enough to throw the numbers off as badly as I'm seeing. In many cases it's close, but its wrong often enough to assume i'm doing something wrong, but I can't see what it is.
Example 1.
Example 2.
Hi @jazz788,
Looking at your screenshots, I see at least two differences.
- Your column for "7 days prior" appears to be using an 8-day range.
- Our charts use a slightly different population data-set, the 2014-2018 5-year ACS data.
Hi @albertsun
Thanks so much for the input.
On Difference 1. I tried everything I could think of - here it is again with a 7 day range. Still significantly off.
It's not just these two counties, they are just samples. It seems around at least half the data maybe more I am doing something wrong, and cannot get a data match, greatly appreciate any further insight here.
On your second point, i welcome including revised population data, I cannot find that dataset yet, still looking, but we are talking large differences on the popup, and I'm very sure that some marginal population differences wont' bridge the gap for whatever I'm doing wrong here.
Thanks so much.
- Your column for "7 days prior" appears to be using an 8-day range.
That would correspond to the average over 7 days of the individual day to day differences or daily new cases. (Intermediate terms cancel and only the end day terms are left, mathematically. Daily increases are so noisy that they are difficult to follow on a graph.) A span of one day difference would have 2 day records, so 7 day difference will involve 8 records (8-day range) when done correctly.
Personally, I use 1-day record difference (previous, this day) to find daily new occurrences (difference) with a function that produces NULL value if there happens to be a bad record on either day. (And assume missing 0'th record has value 0.) However this dataset is extremely clean in that regard and is probably not necessary. Then average over the 7 days (3 days back, 3 days forward so centered), once again ignoring any null values, for "7 day average". More easily understood, much more calculation to get exactly the same simple result.
@jazz788, I realize now you are calculating an average that is lagged 4 days. Rather center as I described (e.g. use -3 days to +3 days for the 7 day range, actually -4 at beginning the way you do it so -3 is the corresponding increase from day-4 but representing the new cases on day-3). That change may over compensate compared to NYT, so may not be "the" answer.
@albertsun also has commented on another thread that NYT has switched to the County Population Totals: 2010-2019 for per county population for per capita reporting.
I'm interested in that population data on county by county basis, edited to have identical county names (or codes) so they can be directly joined with the main tables here to get the population value needed for per capita report on county basis. I think I can clean (edit) that census data to conform to the county names here but I have not see how large a job that will be. Most is just removing the extraneous " county" in the county names to match these datasets.