covid_age icon indicating copy to clipboard operation
covid_age copied to clipboard

observed jumps in Maine cases

Open mpascariu opened this issue 3 years ago • 11 comments

Hi @timriffe,

I am looking at the confirmed cases for Maine state and I see periods with significant jumps. I think it is an isolated event. This might need some attention.

> read_csv(
+   file = "data/Output_10_20201208.zip",
+   skip = 3)%>% 
+   mutate(
+     Date = as.Date(Date, format = "%d.%m.%Y")) %>% 
+   filter(Sex == "b",
+          Region == "Maine",
+          Age == 60) %>% 
+   arrange(Date) %>% 
+   ggplot(aes(x = Date, y = Cases)) + 
+   geom_line(size = 1) + 
+   labs(title = "Confirmed cases in the 60-70 age group")

Rplot

Looking at weekly no of cases per 100k inhabitants we would see this:

C19_Cases_dev_Maine_20201209

mpascariu avatar Dec 09 '20 09:12 mpascariu

In fact it is not an isolated event. I can see this in California and Florida too.

mpascariu avatar Dec 09 '20 10:12 mpascariu

Is it possible to be a date formatting issue?

mpascariu avatar Dec 09 '20 10:12 mpascariu

Thanks for reporting Marius, your observations have been reported to the respective collectors. Date formatting is a possibility. Will let you know as soon as it's fixed.

timriffe avatar Dec 10 '20 08:12 timriffe

Thanks Tim! Here's a view over all US states:


library(tidyverse)

p <- read_csv(
  file = "data/Output_10_20201208.zip",
  skip = 3)%>% 
  mutate(Date = as.Date(Date, format = "%d.%m.%Y"),
         Age = as.factor(Age)) %>% 
  arrange(Date) %>%
  filter(Sex == "b", 
         Country == "USA", 
         # Age %in% c(60, 70, 80),
         Cases > 0) %>%
  ggplot(aes(x = Date, y = Cases, color = Age)) + 
  geom_line(size = 1) + 
  facet_wrap(~ Region, scales = "free", ncol = 3) + 
  scale_y_continuous(labels = scales::label_number_si(accuracy = 0.1)) +
  labs(title = "Monotonicity of confirmed cases, USA") +
  theme(legend.position = "top")

ggsave("chart.png", p, width = 8, height = 18)

chart

mpascariu avatar Dec 10 '20 10:12 mpascariu

OK, this is a good diagnostic, going through one by one. Making a checklist.

  • [x] Arizona (fixed: 0 and 6 in wrong order for age 20, Oct 3 caused it)
  • [x] California (fixed: entry errors now overwritten, source moving to automatic collection with full series refresh)
  • [x] Florida (entry errors fixed in several dates. Digit swaps, that sort of thing.)
  • [x] Idaho (entry errors fixed 08-Oct-2020 and 13-Nov-2020)
  • [x] Iowa (change in age groups: age harmonization each day is currently independent. Needs upgrade to take time series into account. No current fix for these ruptures, other than carrying back the greater detail that starts on Nov 13th to earlier dates)
  • [x] Louisiana
  • [ ] Maine (still investigating)
  • [x] NYC (similar story to Iowa. Not an apparent entry error
  • [x] Vermont (fixed: source gives wrong daily total on 28 sept, removed total entry)

timriffe avatar Dec 10 '20 11:12 timriffe

Hi @timriffe, I can see that most of the data for the states of Iowa, California and Washington disappeared altogether from 07-01-2021 version of the database. Only few weeks of data for each state is left. Was that done on purpose?

mpascariu avatar Jan 07 '21 09:01 mpascariu

Thanks for reporting! Not on purpose. I'm investigating these one at a time.

  • [x] California [changed age codes used, duplicates removed, will appear in output tomorrow]
  • [x] Iowa [source (iowacovid19tracker.org no longer gives this, data rolled back to most recent complete capture (June - mid Dec 2020)]
  • [x] Washington [Drive sheet fixed, should be back tomorrow]

timriffe avatar Jan 07 '21 11:01 timriffe

California and Washington look good on 08-01-2021, however Iowa data still displays major gaps between June and September.

On December 9 I was able to produce this: C19_Cases_dev_Iowa_20201209

Today I can see this: C19_Cases_dev_Iowa_20210108

mpascariu avatar Jan 08 '21 11:01 mpascariu

Thanks @mpascariu I did a manual roll-back yesterday in Drive, as automatic captures had been failing for Iowa. Looks like I chose the wrong date. I've been in contact with the source, who tells me the sheet will be released again soon. This will completely overwrite the Iowa series, FYI. It could be a few days before that makes it through. I'll therefore roll back to the sheet status the day prior to Dec 9 and hopefully you'll get that same data back.

On Fri, Jan 8, 2021 at 12:03 PM Marius D. Pascariu [email protected] wrote:

California and Washington look good on 08-01-2021, however Iowa data still displays major gaps between June and September.

On December 9 I was able to produce this: [image: C19_Cases_dev_Iowa_20201209] https://user-images.githubusercontent.com/6264977/104008213-6de55100-51a9-11eb-8b22-f51246068547.png

Today I can see this: [image: C19_Cases_dev_Iowa_20210108] https://user-images.githubusercontent.com/6264977/104008262-7f2e5d80-51a9-11eb-9343-318b73b61df0.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/timriffe/covid_age/issues/61#issuecomment-756695535, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG43G64IAXRVROHIP3AEX3SY3RBLANCNFSM4UTGXRQA .

timriffe avatar Jan 08 '21 11:01 timriffe

ok, great!

mpascariu avatar Jan 08 '21 11:01 mpascariu

The monotonicity issues can be extended at the country level for the entire database not only for the US regions. This issue has been spotted in various countries.

But maybe a new issue should be open for this (?)

mpascariu avatar Jan 08 '21 13:01 mpascariu