li
li copied to clipboard
New York state level counts and deaths are wrong
Location, date, and short issue description
New York state-level counts and deaths are wrong starting on May 30
File
https://coronadatascraper.com/timeseries.csv
Issue details
On May 30, the number of cases suddenly increases by 21,000. Deaths also correspondingly jump, except on those days when there is no testing reported. Clue?
Snippet/screenshot
> covid.csv.url <- "https://coronadatascraper.com/timeseries.csv"
> ts <- read_csv(covid.csv.url, col_types=cols_only(level='c',city='c',county='c',state='c',country='c',population='d',date='D',cases='d',deaths='d',tested='d'))
> filter(ts, level=='state' & state=='New York' & date > '2020-05-25') %>% select(state,cases, deaths, tested, date) %>% print(n=20)
# A tibble: 34 x 5
state cases deaths tested date
<chr> <dbl> <dbl> <dbl> <date>
1 New York 164558 8446 NA 2020-05-26
2 New York 165020 8495 NA 2020-05-27
3 New York 165705 8543 NA 2020-05-28
4 New York 166308 8575 NA 2020-05-29
5 New York 369660 23848 2005381 2020-05-30
6 New York 167490 8649 NA 2020-05-31
7 New York 371711 23959 2113777 2020-06-01
8 New York 373040 24023 2167831 2020-06-02
9 New York 374085 24079 2229473 2020-06-03
10 New York 375133 24133 2293032 2020-06-04
11 New York 376208 24175 2359512 2020-06-05
12 New York 377316 24212 2437407 2020-06-06
13 New York 378097 24259 2497842 2020-06-07
14 New York 171469 8883 NA 2020-06-08
15 New York 379482 24348 2605869 2020-06-09
16 New York 380156 24404 2668166 2020-06-10
17 New York 380892 24442 2729005 2020-06-11
18 New York 381714 24495 2801400 2020-06-12
19 New York 382630 24527 2872240 2020-06-13
20 New York 383324 24551 2934599 2020-06-14
# … with 14 more rows

As of tonight, New York's cases now exhibit the same up and down as deaths. Indiana shows a similar pattern.
ts <- read_csv(covid.csv.url, col_types=cols_only(level='c',city='c',county='c',state='c',country='c',population='d',date='D',cases='d',deaths='d',tested='d'))
ggplot(filter(ts, country=='United States' & level=='state' & state %in% c('New York','New Jersey','Indiana')), aes(x=date, y=deaths, color=state)) + geom_line()
Hi @dkulp2, following up.
The cases jumping up and down is likely from sources of different priority/rating being used in the final reports. This problem hasn't really been fixed in the new Li reports (at https://covidatlas.com/data), but we're using priority only for reports.
In v2 reports, I'm probably going to simply provide all data, in addition to doing some kind of priority-merged total. If you need that soon-ish, we could actually get that out sooner b/c it will be a brand new v1 report.
Can you check and let me know if the reports exhibit the same behaviour? Cheers, jz