pyam
pyam copied to clipboard
Regression in require behaviour
require_data
is not a drop in replacement for require_variable
. This leads to a regression in behaviour with no easy fix for users.
See script below for demonstration.
Script
import numpy as np
import pandas as pd
import pyam
test = pd.DataFrame(
np.ones((8, 3)),
columns=[2010, 2015, 2020],
index=pd.MultiIndex.from_tuples(
[
(
"scenario_a",
"model_a",
"Emissions|CO2|Waste",
"World",
"GtC / yr",
),
(
"scenario_a",
"model_a",
"Emissions|CO2|Other",
"World",
"GtC / yr",
),
(
"scenario_b",
"model_a",
"Emissions|CO2|Waste",
"World",
"GtC / yr",
),
(
"scenario_b",
"model_a",
"Emissions|CO2|Industrial",
"World",
"GtC / yr",
),
(
"scenario_a",
"model_b",
"Emissions|CO2|Other",
"World",
"GtC / yr",
),
(
"scenario_a",
"model_b",
"Emissions|CO2|Industrial",
"World",
"GtC / yr",
),
(
"scenario_b",
"model_b",
"Emissions|CO2|AFOLU",
"World",
"GtC / yr",
),
(
"scenario_b",
"model_b",
"Emissions|CO2|Industrial",
"World",
"GtC / yr",
),
],
names=[
"scenario",
"model",
"variable",
"region",
"unit",
],
),
)
test = pyam.IamDataFrame(test)
if pyam.__version__.startswith("2"):
matches_requirements = test.require_data(
variable=["Emissions|CO2|Other", "Emissions|CO2|Waste"], exclude_on_fail=True
)
print("3 scenarios fail (the ones that don't have BOTH requirement)")
print(test.exclude)
assert test.exclude.sum() == 1
else:
matches_requirements = test.require_variable(
variable=["Emissions|CO2|Other", "Emissions|CO2|Waste"], exclude_on_fail=True
)
print("Only 1 scenario fails (the one that doesn't have EITHER requirement)")
print(test.meta)
assert test.meta["exclude"].sum() == 1
Behaviour with pyam-iamc 2.0 and require_data
% pip list | grep pyam-iamc && python scratch.py
pyam-iamc 2.0.0
3 scenarios fail (the ones that don't have BOTH requirement)
model scenario
model_a scenario_a False
scenario_b True
model_b scenario_a True
scenario_b True
dtype: bool
Traceback (most recent call last):
File ".../scratch.py", line 85, in <module>
assert test.exclude.sum() == 1
^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
Behaviour with pyam-iamc 1.9 and require_variable
% pip list | grep pyam-iamc && python scratch.py
pyam-iamc 1.9.0
Only 1 scenario fails (the one that doesn't have EITHER requirement)
exclude
model scenario
model_a scenario_a False
scenario_b False
model_b scenario_a False
scenario_b True
I think the basic difference is that require_variable
did an OR requirement (any match was marked as a match). require_data
is an AND requirement (all requirements had to match in order to be marked as match).
I assume that I did think through these changes a while back, but can't recollect my thoughts right now...
But from a first-principles point of view, I do think that checking all items in a list is more intuitive than any for a "requirement".
Question to me is what your use case is? Are you trying to ensure that at least one of "Waste" or "Other" is present?
But from a first-principles point of view, I do think that checking all items in a list is more intuitive than any for a "requirement".
Me too
Question to me is what your use case is? Are you trying to ensure that at least one of "Waste" or "Other" is present?
Trying to get this to behave https://github.com/iiasa/climate-assessment/pull/47 @jkikstra wrote the code that uses this and I assume was trying to do a check for one or both of them being there, but I don't actually know (see this comment for the function which calls it https://github.com/iiasa/climate-assessment/pull/47#issuecomment-1777718277)
cc @phackstock
Still not sure what the actual use case is from that comment, but I guess we could add a kwarg how={"all", "any"}
, default 'all', inspired by pandas.dropna().
As for implementation, I guess if any data is present after applying the filters (=kwargs of require_data()
) , then the "any"-requirement is satisfied for the filters.
Here's the line where it's used: https://github.com/iiasa/climate-assessment/blob/485f3d24fc646ad8d77c65ac5e787a27dc79db04/src/climate_assessment/checks.py#L788
Up to @jkikstra and @phackstock whether it's easier to add the feature back into pyam or just hack a workaround into climate-assessment
No strong feelings from my side either way. I would say in the interest of time it's better to build a workaround in climate-assessment.