li icon indicating copy to clipboard operation
li copied to clipboard

New York state level counts and deaths are wrong

Open dkulp2 opened this issue 4 years ago • 2 comments

Location, date, and short issue description

New York state-level counts and deaths are wrong starting on May 30

File

https://coronadatascraper.com/timeseries.csv

Issue details

On May 30, the number of cases suddenly increases by 21,000. Deaths also correspondingly jump, except on those days when there is no testing reported. Clue?

Snippet/screenshot

> covid.csv.url <- "https://coronadatascraper.com/timeseries.csv"
> ts <- read_csv(covid.csv.url, col_types=cols_only(level='c',city='c',county='c',state='c',country='c',population='d',date='D',cases='d',deaths='d',tested='d'))
> filter(ts, level=='state' & state=='New York' & date > '2020-05-25') %>% select(state,cases, deaths, tested, date) %>% print(n=20)
# A tibble: 34 x 5
   state     cases deaths  tested date      
   <chr>     <dbl>  <dbl>   <dbl> <date>    
 1 New York 164558   8446      NA 2020-05-26
 2 New York 165020   8495      NA 2020-05-27
 3 New York 165705   8543      NA 2020-05-28
 4 New York 166308   8575      NA 2020-05-29
 5 New York 369660  23848 2005381 2020-05-30
 6 New York 167490   8649      NA 2020-05-31
 7 New York 371711  23959 2113777 2020-06-01
 8 New York 373040  24023 2167831 2020-06-02
 9 New York 374085  24079 2229473 2020-06-03
10 New York 375133  24133 2293032 2020-06-04
11 New York 376208  24175 2359512 2020-06-05
12 New York 377316  24212 2437407 2020-06-06
13 New York 378097  24259 2497842 2020-06-07
14 New York 171469   8883      NA 2020-06-08
15 New York 379482  24348 2605869 2020-06-09
16 New York 380156  24404 2668166 2020-06-10
17 New York 380892  24442 2729005 2020-06-11
18 New York 381714  24495 2801400 2020-06-12
19 New York 382630  24527 2872240 2020-06-13
20 New York 383324  24551 2934599 2020-06-14
# … with 14 more rows
NY State Cumulative Cases Deaths

dkulp2 avatar Jun 29 '20 14:06 dkulp2

As of tonight, New York's cases now exhibit the same up and down as deaths. Indiana shows a similar pattern.

ts <- read_csv(covid.csv.url, col_types=cols_only(level='c',city='c',county='c',state='c',country='c',population='d',date='D',cases='d',deaths='d',tested='d'))
ggplot(filter(ts, country=='United States' & level=='state' & state %in% c('New York','New Jersey','Indiana')), aes(x=date, y=deaths, color=state)) + geom_line()

Rplot01 Rplot02

dkulp2 avatar Jul 02 '20 04:07 dkulp2

Hi @dkulp2, following up.

The cases jumping up and down is likely from sources of different priority/rating being used in the final reports. This problem hasn't really been fixed in the new Li reports (at https://covidatlas.com/data), but we're using priority only for reports.

In v2 reports, I'm probably going to simply provide all data, in addition to doing some kind of priority-merged total. If you need that soon-ish, we could actually get that out sooner b/c it will be a brand new v1 report.

Can you check and let me know if the reports exhibit the same behaviour? Cheers, jz

jzohrab avatar Aug 09 '20 18:08 jzohrab