covid_age icon indicating copy to clipboard operation
covid_age copied to clipboard

US New Jersey Data

Open mpascariu opened this issue 3 years ago • 2 comments

In Output_10.csv at the state of New Jersey i can see several weeks in a row with the same records, day after day. Does this indicate zero new cases in the period/region or the lack of new data?

I did this:

> NJ <- read.csv(
+   file = "Output_10_20200901.csv",
+   header = TRUE,
+   skip = 3
+ ) %>%
+   filter(Region == "New Jersey",
+          Sex == "b")  %>%
+   mutate(
+     Date = as.Date(Date, format = "%d.%m.%Y"))
> 
> 
> NJ %>% select(Region, Date, Age, Cases) %>%
+   filter(Date > "2020-08-01") %>% 
+   pivot_wider(names_from = Date, values_from = Cases)
# A tibble: 11 x 24
   Region   Age `2020-08-02` `2020-08-03` `2020-08-04` `2020-08-05` `2020-08-06` `2020-08-07` `2020-08-08`
   <fct>  <int>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
 1 New J~     0       1967.        1967.        1967.        1967.        1967.        1967.        1967. 
 2 New J~    10       4962.        4962.        4962.        4962.        4962.        4962.        4962. 
 3 New J~    20      23854.       23854.       23854.       23854.       23854.       23854.       23854. 
 4 New J~    30      31052.       31052.       31052.       31052.       31052.       31052.       31052. 
 5 New J~    40      26123        26123        26123        26123        26123        26123        26123  
 6 New J~    50      34429.       34429.       34429.       34429.       34429.       34429.       34429. 
 7 New J~    60      26415.       26415.       26415.       26415.       26415.       26415.       26415. 
 8 New J~    70      14328.       14328.       14328.       14328.       14328.       14328.       14328. 
 9 New J~    80       9587.        9587.        9587.        9587.        9587.        9587.        9587. 
10 New J~    90       8184.        8184.        8184.        8184.        8184.        8184.        8184. 
11 New J~   100         70.2         70.2         70.2         70.2         70.2         70.2         70.2
# ... with 15 more variables: `2020-08-09` <dbl>, `2020-08-10` <dbl>, `2020-08-11` <dbl>,
#   `2020-08-12` <dbl>, `2020-08-13` <dbl>, `2020-08-14` <dbl>, `2020-08-19` <dbl>, `2020-08-20` <dbl>,
#   `2020-08-21` <dbl>, `2020-08-22` <dbl>, `2020-08-23` <dbl>, `2020-08-24` <dbl>, `2020-08-25` <dbl>,
#   `2020-08-26` <dbl>, `2020-08-27` <dbl>

mpascariu avatar Sep 01 '20 09:09 mpascariu

I spot checked the screenshots of the dashboard where the data were captured. Indeed, in early August for a series of days (range TBD) the "last update" note was stuck at July 30. We will amend this in the input data. In effect, it will turn out to be a calendar gap in the data. This is among the dashboards that are manually captured, easily understandable human error.

Although one could interpolate fractions between the left and right, and rescale to the crude totals, which are more likely available. So far we're not doing that sort of thing in the database, but it is among the options that may be forthcoming. Your opinion in the matter would be most welcome.

timriffe avatar Sep 01 '20 09:09 timriffe

In my analyses I am also applying linear interpolation if the gap of missing data is small (1 or 2 time intervals). Since I am mostly working with weekly data now that means 1 or 2 weeks. However if the gap is larger different kind or interpolations might be more appropriate provided that there is sufficient data before AND after the gap. As a rule we could define as "twice the number of populated time intervals before and after the gap in order to apply the interpolation" or something like this.

Otherwise, I consider that is is more indicated to just acknowledge the missing information and leave it up to the user to model it.

mpascariu avatar Sep 01 '20 11:09 mpascariu