pyam
pyam copied to clipboard
Enhanced metadata representation - design choices
Resurrecting perhas a bit an oldy but a goody - #126
Prompt
I find myself regularly wanted to carry around, refer back to, graph, and manipulate metadata with dimensions larger than our current df.meta
indicies. Classic examples are net-zero years and derivative outputs like cumulative sums at different time points.
At the moment, I follow a pattern that looks roughly like:
for region in data.region:
data.set_meta_from_data(
name=f'Cumulative {variable} in {region} until {ylabel}',
method='sum',
region=region,
)
Which creates N_v x N_r x N_y
columns in the wide-form metadata frame.
When I want to work with these, then I need to either specify them by name or otherwise pivot the data later (in effectively every single instance so far). I've found this overburdensome and am debating what different solution I could use in the near term.
Choices
- support a different data structure for scalar values
- indicate to users that these use cases should leverage the actual data rather than metadata (I don't know the full ramifications here)
I initially was leaning towards 1 above, but noticed in writing that my specific use case could also be supported by 2, because here I have variable
, region
, and year
indicies.
Open to thoughts/suggestions!
Two consideration that make me quite skeptical of option 1:
- How would this work if various meta-indicators have different dimensions? Some sort of dictionary or nested pandas.DataFrame objects?
- How would a different datastructure for multi-dimensional meta-indicators be saved to file?
Minor quibble: "metadata" is usually used for information like authors, license, title, ... I try to use "meta indicators" for scenario-related datapoints other than timeseries data.