covid19-public icon indicating copy to clipboard operation
covid19-public copied to clipboard

Tracking change in data schema over time

Open tnwei opened this issue 2 years ago • 0 comments

The data schema in this repo, i.e. the columns in CSVs change over time as the pandemic response evolves. Example: commit https://github.com/MoH-Malaysia/covid19-public/commit/74003a38a9e494b37f991d6c9b6321be8f3b90e6 renames deaths to deaths_new in deaths_malaysia.csv and deaths_state.csv. Although this is natural for pandemic data collection, it can be disruptive to downstream data consumers, who had to modify their applications to keep up with the changed data schema.

I'm keeping track of the data schema changes at https://gist.github.com/tnwei/507f582644b9a8c8be167637cea1e2fc, which is updated regularly. Would like to suggest linking to my list in the README, so data consumers can see the lineage of data schema changes.

I'll put in a PR if @MoH-Malaysia is onboard. TQ for your tireless data updates!

Related suggestion in CITF repo: https://github.com/CITF-Malaysia/citf-public/issues/98

tnwei avatar Sep 15 '21 13:09 tnwei