etl icon indicating copy to clipboard operation
etl copied to clipboard

Tracking issue: improvements on `add_regions_to_table`

Open pabloarosado opened this issue 1 year ago • 3 comments

List of known issues and improvements on the new function to create region aggregates, etl.data_helpers.geo.add_regions_to_table:

  • [ ] When searching for overlaps between historical regions and successors (or between regions and overseas territories), only the default regions are inspected. But if one creates a custom region, e.g. "Asia excl. China", or other more complicated combination of countries, any possible overlaps are disregarded. It would be easy to include custom regions as part of the argument regions_and_members passed to detect_overlapping_regions, inside add_regions_to_table.
  • [ ] Currently, the function checks for overlaps, but does nothing about them. Ideally, we should be able to remove overlaps. Imagine you have a dataset that has data for both Russia and USSR on the same years. You may want to keep the data of both, but when creating the aggregate for Europe, you need to remove one of them, to avoid double counting. It is relatively easy to implement, but the complication is in deciding which country to remove (either the historical region or the successor).
  • [ ] Currently, the function adds nothing to the processing log. It would be easy to add an entry at the very end (maybe with a new operation, called e.g. "add_regions").
  • [ ] Likely hard to implement: It would be good to track which countries have been included in each aggregate. This is useful, for example, when calculating per capita indicators of regions. Otherwise, a per capita indicator is calculated as the sum of the original indicator (for a subset of countries) divided by the population of the entire continent, which would be an underestimate. We have corrected for this issue in some datasets, e.g. FAOSTAT. But we don't have a generic solution.

pabloarosado avatar Dec 13 '23 07:12 pabloarosado

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Feb 13 '24 02:02 stale[bot]

Found a bug in detecting overlapping countries when country is of type category. Raised it in https://github.com/owid/etl/issues/2474

lucasrodes avatar Mar 29 '24 13:03 lucasrodes

Found a bug in detecting overlapping countries when country is of type category. Raised it in #2474

Fixed in https://github.com/owid/etl/pull/2478

pabloarosado avatar Mar 29 '24 15:03 pabloarosado

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar May 30 '24 00:05 stale[bot]