etl
etl copied to clipboard
Tracking issue: improvements on `add_regions_to_table`
List of known issues and improvements on the new function to create region aggregates, etl.data_helpers.geo.add_regions_to_table
:
- [ ] When searching for overlaps between historical regions and successors (or between regions and overseas territories), only the default regions are inspected. But if one creates a custom region, e.g. "Asia excl. China", or other more complicated combination of countries, any possible overlaps are disregarded. It would be easy to include custom regions as part of the argument
regions_and_members
passed todetect_overlapping_regions
, insideadd_regions_to_table
. - [ ] Currently, the function checks for overlaps, but does nothing about them. Ideally, we should be able to remove overlaps. Imagine you have a dataset that has data for both Russia and USSR on the same years. You may want to keep the data of both, but when creating the aggregate for Europe, you need to remove one of them, to avoid double counting. It is relatively easy to implement, but the complication is in deciding which country to remove (either the historical region or the successor).
- [ ] Currently, the function adds nothing to the processing log. It would be easy to add an entry at the very end (maybe with a new operation, called e.g. "add_regions").
- [ ] Likely hard to implement: It would be good to track which countries have been included in each aggregate. This is useful, for example, when calculating per capita indicators of regions. Otherwise, a per capita indicator is calculated as the sum of the original indicator (for a subset of countries) divided by the population of the entire continent, which would be an underestimate. We have corrected for this issue in some datasets, e.g. FAOSTAT. But we don't have a generic solution.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Found a bug in detecting overlapping countries when country
is of type category
. Raised it in https://github.com/owid/etl/issues/2474
Found a bug in detecting overlapping countries when
country
is of typecategory
. Raised it in #2474
Fixed in https://github.com/owid/etl/pull/2478
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.