nomenclature
nomenclature copied to clipboard
Add failing test cases to illustrate potentially conflicting information
Closes #290.
@danielhuppmann, I took a look at the questions you brought up in #290 and I think we should be good.
The case that you described would be as follows. Given a model mapping:
model: m_a
native_regions: [region_A, region_B]
common_regions:
- region_C: [region_A, region_B]
with a variable code list:
- Variable A:
definition: Test variable to be used for computing a max aggregate
unit: EJ/yr
region-aggregation:
- Variable A (max):
method: max
- Variable A (max):
unit: EJ/yr
and input data:
IamDataFrame(
pd.DataFrame(
[
["m_a", "s_a", "region_A", "Variable A", "EJ/yr", 1],
["m_a", "s_a", "region_B", "Variable A", "EJ/yr", 1],
["m_a", "s_a", "region_A", "Variable A (max)", "EJ/yr", 2],
["m_a", "s_a", "region_B", "Variable A (max)", "EJ/yr", 1],
],
columns=IAMC_IDX + [2020],
)
)
yields a pyam error for duplicate data:
E ValueError: Duplicate rows in `data`:
E model scenario region variable unit year
E 0 m_a s_a region_C Variable A (max) EJ/yr 2020
meaning that as expected both operations are attempted. The aggregation of Variable A (max)
though the region-aggregation
attribute in Variable A
as well as the "standard" aggregation from the entry Variable A (max)
.
This case is safe though since pyam yields an error. We could specifically protect against it but I'd say it's fine.
There might be more cases to consider though.
Only Variable A (max)
Take the above data but eliminate the first two lines for Varible A
. In this case we'd get the following aggregation result:
model scenario region variable unit year value
0 m_a s_a region_A Variable A (max) EJ/yr 2020 2
1 m_a s_a region_B Variable A (max) EJ/yr 2020 1
2 m_a s_a region_C Variable A (max) EJ/yr 2020 3
for region_C, we now get 3 which is the sum, not the max of region_A and region_B.
This is wrong but expected since there is no method set for the aggregation of Variable A (max)
. We could safeguard against that relatively easy by enforcing that aggregation methods between the region-aggregation
attribute and the "normal" variable must be the same. So:
- Variable A:
definition: Test variable to be used for computing a max aggregate
unit: EJ/yr
region-aggregation:
- Variable A (max):
method: max
- Variable A (max):
unit: EJ/yr
method: max
in the above example. We could also make it more simple and remove the method attribute from the variable inside the region-aggregation
attribute so that the method information is taken from the main variable directly.
Only Variable A
This is the straightforward version of the above case but I wanted to mention it. Taking only the first two rows of data gives:
model scenario region variable unit year value
0 m_a s_a region_A Variable A EJ/yr 2020 1
1 m_a s_a region_B Variable A EJ/yr 2020 1
2 m_a s_a region_C Variable A (max) EJ/yr 2020 1
which is correct and what we expect.
Variable A (max)
in aggregation region
The final case that I could find is this one:
IamDataFrame(
pd.DataFrame(
[
["m_a", "s_a", "region_A", "Variable A", "EJ/yr", 1],
["m_a", "s_a", "region_B", "Variable A", "EJ/yr", 1],
["m_a", "s_a", "region_C", "Variable A (max)", "EJ/yr", 2],
],
columns=IAMC_IDX + [2020],
)
)
)
where Variable A (max)
exists but for the common region region_C
. In this case we also don't get an error since the provided data always takes precedence over aggregated and we get:
model scenario region variable unit year value
0 m_a s_a region_A Variable A EJ/yr 2020 1
1 m_a s_a region_B Variable A EJ/yr 2020 1
2 m_a s_a region_C Variable A (max) EJ/yr 2020 2
with the warning that there is a difference between aggregated and provided data for region_C
.
Summary
- The case described by you in #290, would throw a pyam error and since I've never seen it so far I'd say we can ignore that case.
- The only other case we maybe should be safeguarding against is conflicting information between the variable mentioned in the
region-aggregation
attribute and the "original" variable entry. One way out of this could be to only allow mentioning the variable name inregion-aggregation
, all other information is then read from the original entry.
@danielhuppmann, looking forward to your thoughts. I think I've thought through every case but please let me know if you've spotted an error.