covid_age
covid_age copied to clipboard
observed jumps in Maine cases
Hi @timriffe,
I am looking at the confirmed cases for Maine state and I see periods with significant jumps. I think it is an isolated event. This might need some attention.
> read_csv(
+ file = "data/Output_10_20201208.zip",
+ skip = 3)%>%
+ mutate(
+ Date = as.Date(Date, format = "%d.%m.%Y")) %>%
+ filter(Sex == "b",
+ Region == "Maine",
+ Age == 60) %>%
+ arrange(Date) %>%
+ ggplot(aes(x = Date, y = Cases)) +
+ geom_line(size = 1) +
+ labs(title = "Confirmed cases in the 60-70 age group")
Looking at weekly no of cases per 100k inhabitants we would see this:
In fact it is not an isolated event. I can see this in California and Florida too.
Is it possible to be a date formatting issue?
Thanks for reporting Marius, your observations have been reported to the respective collectors. Date formatting is a possibility. Will let you know as soon as it's fixed.
Thanks Tim! Here's a view over all US states:
library(tidyverse)
p <- read_csv(
file = "data/Output_10_20201208.zip",
skip = 3)%>%
mutate(Date = as.Date(Date, format = "%d.%m.%Y"),
Age = as.factor(Age)) %>%
arrange(Date) %>%
filter(Sex == "b",
Country == "USA",
# Age %in% c(60, 70, 80),
Cases > 0) %>%
ggplot(aes(x = Date, y = Cases, color = Age)) +
geom_line(size = 1) +
facet_wrap(~ Region, scales = "free", ncol = 3) +
scale_y_continuous(labels = scales::label_number_si(accuracy = 0.1)) +
labs(title = "Monotonicity of confirmed cases, USA") +
theme(legend.position = "top")
ggsave("chart.png", p, width = 8, height = 18)
OK, this is a good diagnostic, going through one by one. Making a checklist.
- [x] Arizona (fixed: 0 and 6 in wrong order for age 20, Oct 3 caused it)
- [x] California (fixed: entry errors now overwritten, source moving to automatic collection with full series refresh)
- [x] Florida (entry errors fixed in several dates. Digit swaps, that sort of thing.)
- [x] Idaho (entry errors fixed 08-Oct-2020 and 13-Nov-2020)
- [x] Iowa (change in age groups: age harmonization each day is currently independent. Needs upgrade to take time series into account. No current fix for these ruptures, other than carrying back the greater detail that starts on Nov 13th to earlier dates)
- [x] Louisiana
- [ ] Maine (still investigating)
- [x] NYC (similar story to Iowa. Not an apparent entry error
- [x] Vermont (fixed: source gives wrong daily total on 28 sept, removed total entry)
Hi @timriffe, I can see that most of the data for the states of Iowa, California and Washington disappeared altogether from 07-01-2021 version of the database. Only few weeks of data for each state is left. Was that done on purpose?
Thanks for reporting! Not on purpose. I'm investigating these one at a time.
- [x] California [changed age codes used, duplicates removed, will appear in output tomorrow]
- [x] Iowa [source (iowacovid19tracker.org no longer gives this, data rolled back to most recent complete capture (June - mid Dec 2020)]
- [x] Washington [Drive sheet fixed, should be back tomorrow]
California and Washington look good on 08-01-2021, however Iowa data still displays major gaps between June and September.
On December 9 I was able to produce this:
Today I can see this:
Thanks @mpascariu I did a manual roll-back yesterday in Drive, as automatic captures had been failing for Iowa. Looks like I chose the wrong date. I've been in contact with the source, who tells me the sheet will be released again soon. This will completely overwrite the Iowa series, FYI. It could be a few days before that makes it through. I'll therefore roll back to the sheet status the day prior to Dec 9 and hopefully you'll get that same data back.
On Fri, Jan 8, 2021 at 12:03 PM Marius D. Pascariu [email protected] wrote:
California and Washington look good on 08-01-2021, however Iowa data still displays major gaps between June and September.
On December 9 I was able to produce this: [image: C19_Cases_dev_Iowa_20201209] https://user-images.githubusercontent.com/6264977/104008213-6de55100-51a9-11eb-8b22-f51246068547.png
Today I can see this: [image: C19_Cases_dev_Iowa_20210108] https://user-images.githubusercontent.com/6264977/104008262-7f2e5d80-51a9-11eb-9343-318b73b61df0.png
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/timriffe/covid_age/issues/61#issuecomment-756695535, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG43G64IAXRVROHIP3AEX3SY3RBLANCNFSM4UTGXRQA .
ok, great!
The monotonicity issues can be extended at the country level for the entire database not only for the US regions. This issue has been spotted in various countries.
But maybe a new issue should be open for this (?)