NCoVUtils icon indicating copy to clipboard operation
NCoVUtils copied to clipboard

Reference Administrative Names

Open hamishgibbs opened this issue 4 years ago • 6 comments

Currently, many regional case count datasets are being returned from the package without clear reference to an existing geographic dataset. This means that users need to do some name matching before mapping case counts or joining them to other available datasets.

We are considering adding an iso_3166_2 field to all regional case counts to allow quick joins. This would improve the quality of the data being provided to users but involves some more work to manually match administrative names and fix administrative name matching as datasets change.

The current proposal is to create a directory in the raw-data folder with lookup tables with two fields: name_as_recieved and iso_3166_2. A function can then be incorporated into existing functions that reads from this directory (hosted on github) and joins iso codes to administrative names. We can then write tests to check that names continue to match the lookup tables exactly.

I believe this would improve the usability of the data but would increase the amount of work to create a new function a bit and will also lead to more tests breaking when datasets change.

Would be good to hear how people feel about this addition, especially as we add more case counts for LMIC.

@seabbs @kathsherratt @ffinger

hamishgibbs avatar Apr 10 '20 19:04 hamishgibbs