Explicit list of subcategories as "components" attribute
As a follow-up to #16, we should add explicit "components" attributes to secondary-energy variables, e.g., which energy-carriers are part of non-biomass renewables to facilitate automated validation and consistency checks.
fyi @orichters @phackstock
This is also common practice by organizations that provide SDMX—for instance, from the IMF:
>>> import sdmx
>>> IMF = sdmx.Client("IMF")
>>> msg = IMF.codelist("CL_AREA")
>>> cl = msg.codelist["CL_AREA"]
>>> cl
<Codelist IMF:CL_AREA(1.15) (901 items): Area code list>
>>> c = cl["A2A3"]
>>> c
<Code A2A3: North and Central American countries (CDIS)>
>>> c.description
en: A2A3 = BZ + CA + CR + SV + GT + HN + MX + NI + PA + US + A2A39
It seems common in the wild that this is a line in the description, usually the last line; but I think it would be easier to handle and parse if it were a separate annotation.
Per @orichters example, since it's very common to have spaces in IAMC variable names, some form of quoting should be allowed or required.
Note that you may also end up with more complicated "summations".
Emissions|Kyoto Gases = Emissions|F-Gases + 0.265 * Emissions|N2O + 28 * Emissions|CH4 + Emissions|CO2
Might be worth considering when setting up the structure.
Thank you for your comments.
In order to keep a simple codebase, I strongly suggest that we keep close to standard yaml syntax to avoid parsing where possible. Having a variable
Population:
components: [Population|Female, Population|Male]
or (for longer lists)
Population:
components:
- Population|Female
- Population|Male
is just as readable as a string separated by special characters.
Also, this way, the arguments can be directly passed to the pyam methods that will do the processing internally, e.g., IamDataFrame.aggregate().
For more complex operations beyond sum, min, max or weighted average, I suggest to have a dedicated Processor subclass in the nomenclature package - after all, the Kyoto-GHG-aggregation will require configuration like which emissions are required, which GWP to use, etc. Let's please discuss this as a separate (new) issue in the nomenclature repository.
@danielhuppmann: I had a look now because we want to use the summation checks internally in REMIND for scenarioMIP. With the few examples that are implemented, it works fine. Are there any additions to be planned soon? It would be good for me to know what the format in the xlsx file looks like if more than one summation group per variable is specified.
In case you need some inspiration for possible summation groups, here is our list of NAVIGATE summation groups: https://github.com/pik-piam/piamInterfaces/blob/master/inst/summations/summation_groups_NAVIGATE.csv
@orichters that looks really great! Indeed, I see this causing headaches for ScenarioMIP and it would be great if this will be taken up.
I really support the idea of trying to identify, wherever possible, how variables should be adding up together.
For many post-processing tools, like climate-assessment (which, quietly, assumes that "Emissions|CO2" = "Emissions|CO2|AFOLU" + "Emissions|CO2|Energy and Industrial Processes" + "Emissions|CO2|Other" + "Emissions|CO2|Waste"), it is important to know what variables are supposed to form a complete set together.
Providing guidance to models on expectations here would be a very nice step towards better aligning results across models.
Especially when the variable list is expanding, when multiple different ways of summing are possible, it becomes more pressing. "Var" = sum("Var|*") is hardly ever true.
@phackstock @danielhuppmann