message_ix icon indicating copy to clipboard operation
message_ix copied to clipboard

Limit read_excel() to existing elements for set 'year' etc.

Open OFR-IIASA opened this issue 4 years ago • 2 comments

read_excel allows the upload of data for years not defined in the scenario, this creates issues and therefore either the data-upload should be rejected or a new scenario with additional timesteps using the function add_year should be created.

If a scenario has the years defined:

scen.set('year').tolist()

# This yields
[1960, 1970, 1980, 1990, 1995, 2000, 2005, 2010, 2015, 2020, 2025, 2030, 2035, 2040, 2045, 2050, 2060, 2070, 2080, 2090, 2100, 2110]

And an data is added using the function read_excel:

scen.read_excel('<name_of_my_file.xlsx>, add_units=True, init_items=True)

# And then the duration_period parameter is retrieved, the following is the result
year  value unit
0   1970   10.0    y
1   1980   10.0    y
2   1990   10.0    y
3   1995    5.0    y
4   2000    5.0    y
5   2005    5.0    y
6   2010    5.0    y
7   2015    5.0    y
8   2020    5.0    y
9   2025    5.0    y
10  2030    5.0    y
11  2035    5.0    y
12  2040    5.0    y
13  2045    5.0    y
14  2050    5.0    y
15  2060   10.0    y
16  2070   10.0    y
17  2080   10.0    y
18  2090   10.0    y
19  2100   10.0    y
20  2110   10.0    y
21  1960   10.0    y
22  1955    5.0    y
23  1965   10.0    y
24  1975   10.0    y
25  1985   10.0    y
26  2055   70.0    y
27  1950 -160.0    y

OFR-IIASA avatar Jun 25 '21 09:06 OFR-IIASA

Some thoughts:

  1. ixmp.Scenario.read_excel() was originally written to support reading all data (sets and parameters) for a scenario into an empty Scenario object. The newer implementation in ixmp.backend.io and tests are for this behaviour.
  2. It's nice that the code is flexible enough so that it works in updating a Scenario that already contains data. But this is a novel usage: there was never any discussion of how it should work, and there are no tests for it.
  3. The code is in ixmp, and it doesn't know how to give preferential treatment to different sets and parameters that are only defined in message_ix, e.g. year.

We can imagine many different use-cases like (2): A. The one described above: add or modify parameter values, but do not add values to year (or any set? Only some sets?) B. The user has a file that describes one technology to be added to an existing model. The technology does not exist in the model. The sheet for the technology set contains the ID of the new technology to be added; other sheets contain parameter data that reference this new set element.

We could consider some enhancements:

  • A new option to ixmp.backend.io.s_read_excel to read only parameter data, and no elements for any set.

  • Enhance ixmp.backend.base.Backend.read_file, s_read_excel, and Scenario.read_excel to accept/pass/handle a filters= keyword argument that allows to specify, e.g. filters=dict(year=self.set("year"), …) and filters the data read from file before adding it.

    In this case, further enhancements could be added by overriding Scenario.read_excel() in message_ix to apply defaults to this argument, e.g. for year, year_vtg, year_act, etc.

In any case, the code can't distinguish between (A) and (B) vs. (1); the user would always need to indicate the desired behaviour somehow.

In the meantime, the docs for ixmp and/or message_ix could be improved to explain (1), and also link to this issue.

khaeru avatar Jun 25 '21 11:06 khaeru

Maybe the slightly related issue #163 could also be handled simultaneously with this one.

LauWien avatar Jun 25 '21 12:06 LauWien