covid-19-data icon indicating copy to clipboard operation
covid-19-data copied to clipboard

Is fine-grained/individual data being collected, too? Will it be ever shared?

Open psyguy opened this issue 4 years ago • 1 comments

Hi,

The datasets released on this repo (like those of Johns Hopkins University, at country level) are coarse datasets (total counts of infected/dead/recovered). Such coarseness highly constraints statistical modeling beyond simple1 descriptive analysis.

By individual/fine-grained data I mean status updates of individuals from testing positive until the case is closed (i.e., by recovery or death), and some background descriptions (age, sex, pre-existing conditions, etc.). A similar dataset have been collected by Xu and colleagues (Nature Scientific Data, DOI:10.1038/s41597-020-0448-0) and is online on @beoutbreakprepared's repository.

I can imagine @nytimes would have legal concerns (e.g., regarding HIPAA laws, as @CBG-63 has mentioned in #11). I'm no lawyer, but I hope some sort of anonymization might waive that.


1 Actually, in case you are interested in modeling with coarse data, take a look at coarseDataTools R package. It has facilities to estimate, e.g., case fatality rate (Reich et al. 2012; DOI:10.1111/j.1541-0420.2011.01709.x).

psyguy avatar Mar 27 '20 18:03 psyguy

This would be particularly helpful for estimating the long-term case fatality rate while we're still in the early stages of rapid exponential growth.

The large majority of confirmed cases in the US are recently contracted and haven't run their course yet. This data would allow accurate knowledge of the typical time lag from diagnosis to death in fatal cases, in order to properly adjust for this in estimating the case fatality rate.

davidsj avatar Mar 27 '20 20:03 davidsj