covid19.analytics icon indicating copy to clipboard operation
covid19.analytics copied to clipboard

The acumulate deads for some contries are wrong

Open carloserwin opened this issue 5 years ago • 6 comments

Do this, for example

data <- covid19.data("ts-deaths")

x <- data[data$Country.Region == "Germany", ] y <- as.numeric(x[-(1:4)]) plot(y, type = "l")

Do you see the problem?

If not, try looking at this particular "aggregate number of deads" in these dates:

2020-04-10 2020-04-11 2767 > (cant be) 2736

This also happends for India, and I do not know if for other countries.

Regards, CE

obviously this is not correct.

carloserwin avatar Apr 13 '20 22:04 carloserwin

Thanks for reporting this. I took a look at the data and the issue is in the actual data source from JHU, see

https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv

You can see the issue with the case of Germany as you noticed. I couldn't find any problems with the numbers reported for India though.

There is no much I can do other than letting the people at JHU know about this as that is the source of the data in this case.

mponce0 avatar Apr 13 '20 22:04 mponce0

Thanks!!!

For India, you can look at the mistake easily

x <- data[data$Country.Region == "India", -(1:4)] diff(as.numeric(x))

if there is a negative there must be something wrong.

Cheers! CE

carloserwin avatar Apr 13 '20 23:04 carloserwin

Yes, I can see that, thanks again.

I have opened an issue with JHU/CCSEGIS, see https://github.com/CSSEGISandData/COVID-19/issues/2165

This is the list of location I found with this anomalies,

44 Prince Edward Island Canada 45 Quebec Canada 91 Cyprus 107 Finland 121 Germany 131 Iceland 132 India 142 Kazakhstan 183 Philippines 195 Serbia 198 Slovakia

In the meanwhile I will implement some checks to warn the user about this.

mponce0 avatar Apr 13 '20 23:04 mponce0

Great, thanks!

And thanks for doing that R library!!

Cheers, CE

On Mon 13 Apr 2020 at 18:29 mponce0 [email protected] wrote:

Yes, I can see that, thanks again.

I have opened an issue with JHU/CCSEGIS, see CSSEGISandData/COVID-19#2165 https://github.com/CSSEGISandData/COVID-19/issues/2165

This is the list of location I found with this anomalies,

44 Prince Edward Island Canada 45 Quebec Canada 91 Cyprus 107 Finland 121 Germany 131 Iceland 132 India 142 Kazakhstan 183 Philippines 195 Serbia 198 Slovakia

In the meanwhile I will implement some checks to warn the user about this.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mponce0/covid19.analytics/issues/3#issuecomment-613146174, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDKNWP4RDARBB3QM7DVFF3RMON5BANCNFSM4MHIZO4Q .

carloserwin avatar Apr 13 '20 23:04 carloserwin

Three new functions have been added to the package to test for data integrity and consistency:

  • integrity.check: is a function that determines whether there are integrity issues within the datasets or changes to the structure of the data as reported by JHU/CCSEGIS
  • consistency.check: is a function that determines whether there are consistency issues within the data, such as, anomalies in the cumulative quantities of the data as reported by JHU/CCSEGIS
  • data.checks: is a function to check for data integrity and data consistency invoking the previous functions

These functions are already part of the development version of the package available in the GitHub repository and will be included in the next version of the package submitted to CRAN.

mponce0 avatar Apr 16 '20 06:04 mponce0

These functions are also part of version v1.1 available to be installed from CRAN.

mponce0 avatar May 13 '20 06:05 mponce0