data
data copied to clipboard
Bad data quality South America
Hi Opencovid Team
Thanks again for your efforts on gathering all the data. While looking through the data I observed some strange behaviour in various country districts and municipalities.
Especially in those countries: Argentina: La Rioja, La Pampa... there the value of the cases increase at the begining and decrase in the middle again.
Bolivia: La Paz,... Brazil: Acre,... Chile: Antofagasta,... Peru: Ancash There the values at the beginning are constantly very wrong.
Mexico: Tlaxcala The values jump a lot at the start of the tracking
Poland: Greater Poland,... 13.6 There the values decrease from 2000 something to 24, and increase the next day again to 2000
Czechia: Prague, 13.7 no more data for death or recovered are available
We use your data for our website to show some statistics and developments. You can have a look at one example here: https://covid.lanthaler.com/BO/cochabamba/
I hope you keep up your great work. Thank you
Thank you for the kind words and for reporting these issues. I can confirm that I see some of the problems that you reported, for example Tlaxcala's numbers:
I'm guessing it's some date-parsing error. I'll look into it and get back to you.
We narrowed it down to a particularly careless data source, and we now heavily filter their data to only take what looks reasonable. I visually inspected all the examples you provided, and they look fine to me now. Can you verify?
Also, can I add your page to the grid of data users at the top of the page?
I will check them. And of course you can add us to the the grid of data users.
The data for Bolivia, Brazil, Chile looks very good.
There are still some minor data anomalies: Argentina:
- Chubut (has 84 total cases on april 14, on april 15 it is reduced to 1),
- La Pampa (has more total deaths than total infected)
- La Rioja (same as Chubut)
Peru:
- Lima (total death is the same as total cases)
- nearly all provinces show the behaviour of Lima
Mexico:
- Baja California: Current day shows only a fraction of the previous day (looks like incomplete count for this day)
- Campeche (same as Baja)
- Chiapas (same as Baja)
- Morelos (same as Baja)
Thank you for the detailed feedback!
Argentina
We just switched to a new data source via #301 so all of these should be resolved.
Peru
I had made a silly mistake and used the same URL for confirmed and deceased cases... Fix via #302
Mexico
I will double check, but I think this is just the nature of our data source which outputs incomplete data for the latest day. If it's frequent enough (i.e. it's happening for all subregions) I would consider tossing out the latest day but I would strongly prefer not to filter the data since it's coming directly from an authoritative source.