common-definitions icon indicating copy to clipboard operation
common-definitions copied to clipboard

Explicit list of subcategories as "components" attribute

Open danielhuppmann opened this issue 2 years ago • 6 comments

As a follow-up to #16, we should add explicit "components" attributes to secondary-energy variables, e.g., which energy-carriers are part of non-biomass renewables to facilitate automated validation and consistency checks.

fyi @orichters @phackstock

danielhuppmann avatar Sep 28 '23 09:09 danielhuppmann

  • Not only Secondary Energy. This is what we use for to check REMIND submissions: AR6, NAVIGATE
  • Note that we best implement it that multiple summation groups can exist, for example:
    • Population = Population|Female + Population|Male
    • Population = Population|Rural + Population|Urban

orichters avatar Sep 28 '23 09:09 orichters

This is also common practice by organizations that provide SDMX—for instance, from the IMF:

>>> import sdmx
>>> IMF = sdmx.Client("IMF")
>>> msg = IMF.codelist("CL_AREA")
>>> cl = msg.codelist["CL_AREA"]
>>> cl
<Codelist IMF:CL_AREA(1.15) (901 items): Area code list>
>>> c = cl["A2A3"]
>>> c
<Code A2A3: North and Central American countries (CDIS)>
>>> c.description
en: A2A3 = BZ + CA + CR + SV + GT + HN + MX + NI + PA + US + A2A39

It seems common in the wild that this is a line in the description, usually the last line; but I think it would be easier to handle and parse if it were a separate annotation.

Per @orichters example, since it's very common to have spaces in IAMC variable names, some form of quoting should be allowed or required.

khaeru avatar Dec 06 '23 14:12 khaeru

Note that you may also end up with more complicated "summations".

Emissions|Kyoto Gases = Emissions|F-Gases + 0.265 * Emissions|N2O + 28 * Emissions|CH4 + Emissions|CO2

Might be worth considering when setting up the structure.

orichters avatar Dec 06 '23 14:12 orichters

Thank you for your comments.

In order to keep a simple codebase, I strongly suggest that we keep close to standard yaml syntax to avoid parsing where possible. Having a variable

Population:
    components: [Population|Female, Population|Male]

or (for longer lists)

Population:
    components: 
        - Population|Female
        - Population|Male

is just as readable as a string separated by special characters.

Also, this way, the arguments can be directly passed to the pyam methods that will do the processing internally, e.g., IamDataFrame.aggregate().

For more complex operations beyond sum, min, max or weighted average, I suggest to have a dedicated Processor subclass in the nomenclature package - after all, the Kyoto-GHG-aggregation will require configuration like which emissions are required, which GWP to use, etc. Let's please discuss this as a separate (new) issue in the nomenclature repository.

danielhuppmann avatar Dec 11 '23 08:12 danielhuppmann

@danielhuppmann: I had a look now because we want to use the summation checks internally in REMIND for scenarioMIP. With the few examples that are implemented, it works fine. Are there any additions to be planned soon? It would be good for me to know what the format in the xlsx file looks like if more than one summation group per variable is specified.

In case you need some inspiration for possible summation groups, here is our list of NAVIGATE summation groups: https://github.com/pik-piam/piamInterfaces/blob/master/inst/summations/summation_groups_NAVIGATE.csv

orichters avatar Sep 11 '24 09:09 orichters

@orichters that looks really great! Indeed, I see this causing headaches for ScenarioMIP and it would be great if this will be taken up.

I really support the idea of trying to identify, wherever possible, how variables should be adding up together. For many post-processing tools, like climate-assessment (which, quietly, assumes that "Emissions|CO2" = "Emissions|CO2|AFOLU" + "Emissions|CO2|Energy and Industrial Processes" + "Emissions|CO2|Other" + "Emissions|CO2|Waste"), it is important to know what variables are supposed to form a complete set together.

Providing guidance to models on expectations here would be a very nice step towards better aligning results across models.

Especially when the variable list is expanding, when multiple different ways of summing are possible, it becomes more pressing. "Var" = sum("Var|*") is hardly ever true.

@phackstock @danielhuppmann

jkikstra avatar Sep 26 '24 11:09 jkikstra