message_ix Adjust calculation of PRICE

This PR revises the calculation of the PRICE_EMISSION to correctly account for the period-duration when deriving the period-specific PRICE_EMISSION from a single marginal value which covers multiple periods (cumulative constraint).

How to review

Read the diff and note that the CI checks all pass.
Build the documentation and look at revised documentation section.
Ensure that changes/additions are self-documenting, i.e. that another developer (someone like the reviewer) will be able to understand what the code does in the future.

PR checklist

[ ] Continuous integration checks all ✅
[ ] Add or expand tests; coverage checks both ✅
[ ] Add, expand, or update documentation.
[ ] Update release notes.

[!NOTE] I'm editing this description to keep track of what has happened in this PR and what's still required. My edits are restricted to below this point.

Notes on progress

On January 22, 2025, @glatterf42 went through the posts here and marked everything as outdated that's no longer relevant to finishing this PR. Several things are still outstanding:

The documentation might hold warnings about this issue or explanations about how PRICE_EMISSION is calculated and needs to be updated in these cases.
The release notes need to be updated, too.
The tests are not sufficient and not passing.

This last point is really the main issue here and the focus of (almost) all posts not marked as outdated: what kind of test do we want to have for the PRICE_EMISSION feature and and how we we write it so that it passes?

@OFR-IIASA outlined five test cases that would be sufficient and four of them are already present and passing. However, the fifth case is still missing, even after several tries by @behnam-zakeri, @yiyi1991, and @glatterf42. The jupyter notebook in this PR was just used for testing and debugging this work, and should be dropped before merging this PR to main.

Thanks to discussions with the MESSAGE team, we believe that the issue lies in the complexity of the scenarios used for testing. When using the PRICE_EMISSION fix at hand with full global models, both @OFR-IIASA and @yiyi1991 confirmed that the fifth test case is passing fine. Thus, our assumption is that the test is not passing here because the scenario used is too simple: for example, with an insufficient number of technologies defined, the model might decide to use the cheap, high emission technology as long as it can before switching suddenly to the expensive, low emission technology. It seems that such pointwise behavior (as opposed to a shift on a continuum of technologies) causes the duality we want to test for to no longer hold.

One possible workaround is to use a global model for testing -- and we already do that: our nightly tests load the full global snapshot from zenodo. Since this is a large model, that takes a significant amount of time, about 45 minutes (estimated by @yiyi1991). Thus, we don't want to include this in our regular CI suite. We could instead think about migrating this test to the nightly tests, where more runtime is not all that problematic. Alternatively, we could also invest some time to develop/fine-tune a scenario just for this missing duality test.

Jun 30 '23 07:06 OFR-IIASA

Note that there are 3 relevant failing tests (in test_feature_price_emission.py) but also 2 failing tests not related to this PR (in tests/test_report.py and tests/test_tutorials.py)

Dec 05 '23 09:12 gidden

I'm not so sure about that. For test_report.py, this is the error message:

         # message_ix.Reporter can also be initialized
        rep = Reporter.from_scenario(scen)
    
        # Number of quantities available in a rudimentary MESSAGEix Scenario
>       assert 268 == len(rep.graph["all"])
E       assert 268 == 272
E        +  where 272 = len([<ACT:nl-t-yv-ya-m-h>, <ACTIVITY_BOUND_ALL_MODES_LO-margin:n-t-y-h>, <ACTIVITY_BOUND_ALL_MODES_LO:n-t-y-h>, <ACTIVITY_BOUND_ALL_MODES_UP-margin:n-t-y-h>, <ACTIVITY_BOUND_ALL_MODES_UP:n-t-y-h>, <ACTIVITY_BOUND_LO-margin:>, ...])

Here, we have a hardcoded number of quantities we expect in a MESSAGEix Scenario. However, this PR changes the number of quantities by adding these lines to message_ix/models.py:

par("df_year", "y")
par("df_period", "y")
par("levelized_cost", "n t y h")
var(
    "PRICE_EMISSION_NEW",
    "n type_emission type_tec y",
    "TEMPORARY test for Emission price fix",
)

Now, I can't see the whole output of what len() is measuring, but everything that appears there are quantities defined in message_ix/models.py sorted alphabetically. And since our expected number is also off by exactly four, my guess is this error is related. To fix it, we could simply raise the expected number to 272. However, we probably don't want to keep PRICE_EMISSION_NEW around, it should ideally be merged with PRICE_EMISSION_NEW during the clean-up work that this PR requires. I don't know about the other quantities, but if they remain, the solution here is indeed to just increase the expected number of quantities :)

For test_tutorials.py, I'm not entirely sure what the fix is. Let's start with the complete traceback again:

nb_path = PosixPath('/home/runner/work/message_ix/message_ix/tutorial/westeros/westeros_report.ipynb')
checks = [('len-rep-graph', 13724)]
tmp_path = PosixPath('/tmp/pytest-of-runner/pytest-0/popen-gw1/test_tutorial_westeros_report_5')
tmp_env = environ({'_R_CHECK_SYSTEM_CLOCK_': 'FALSE', 'SELENIUM_JAR_PATH': '/usr/share/java/selenium-server.jar', 'CONDA': '/usr...FRAME_EVAL': 'NO', 'PYTEST_CURRENT_TEST': 'message_ix/tests/test_tutorials.py::test_tutorial[westeros_report] (call)'})

    @pytest.mark.parametrize("nb_path,checks", TUTORIALS, ids=IDS, indirect=["nb_path"])
    def test_tutorial(nb_path, checks, tmp_path, tmp_env):
        """Test tutorial in the IPython or IR notebook at `nb_path`.
    
        If `checks` are given, values in the specified cells are tested.
        """
        if nb_path.name == "westeros_baseline_using_xlsx_import_part2.ipynb":
            # Copy data files used by this tutorial to `tmp_path`
            for name in DATA_FILES:
                copyfile(nb_path.parent / name, tmp_path / name)
    
        # Determine arguments for run_notebook()
        args = default_args()
        if nb_path.name.startswith("R_"):
            args.update(kernel_name="IR")
    
        # The notebook can be run without errors
        nb, errors = run_notebook(nb_path, tmp_path, tmp_env, **args)
        assert errors == []
    
        # Cell(s) identified by name or index have a particular value
        for cell, value in checks:
>           assert np.isclose(get_cell_output(nb, cell), value)
E           assert False
E            +  where False = <function isclose at 0x7f36903354c0>(13775, 13724)
E            +    where <function isclose at 0x7f36903354c0> = np.isclose
E            +    and   13775 = get_cell_output({'cells': [{'attachments': {}, 'cell_type': 'markdown', 'metadata': {}, 'source': '# Westeros Tutorial - Introducing reporting\n\n‘Reporting’ is the term used in the MESSAGEix ecosystem to refer to any calculations performed _after_ the MESSAGE mathematical optimization problem has been solved.\n\nThis tutorial introduces the reporting features provided by the ``ixmp`` and ``message_ix`` packages.\nIt was developed by Paul Natsuo Kishimoto ([@khaeru](https://github.com/khaeru)) for a MESSAGEix training workshop held at IIASA in October 2019.\nParticipants in the MESSAGEix workshops of June and September 2020 contributed feedback.\n<!-- Add a line here if you revise the tutorial! -->\n\n**Pre-requisites**\n- You have the *MESSAGEix* framework installed and working\n  In particular, you should have installed ``message_ix[report]``, which requires ``ixmp[report]``, ``genno[compat]``, and ``plotnine``\n- Complete tutorial Part 1 (``westeros_baseline.ipynb``)\n  - Understand the following MESSAGEix terms: ‘variable’, ‘parameter’\n- Open the [‘Reporting’ page in the MESSAGEix documentation](https://docs.messageix.org/en/stable/reporting.html); bookmark it or keep it open in a tab.\n  S...pe': 'stream', 'name': 'stdout', 'text': "'E':\n- <lambda>\n- 'A':\n  - 1\n- 'C':\n  - sum_calc\n  - 'A' (above)\n  - 'B':\n    - 2\n- 'D':\n  - 12\n"}, {'output_type': 'execute_result', 'metadata': {}, 'data': {'text/plain': '5.0'}, 'execution_count': 31}], 'source': 'rep.add("D", 12)\nrep.add("E", (lambda a, c, d: a + d * (a / c), "A", "C", "D"))\nprint(rep.describe("E"))\n\nrep.get("E")\n'}, {'cell_type': 'code', 'execution_count': 32, 'metadata': {'execution': {'iopub.status.busy': '2023-12-01T09:49:15.984386Z', 'iopub.execute_input': '2023-12-01T09:49:15.984888Z', 'iopub.status.idle': '2023-12-01T09:49:16.144007Z', 'shell.execute_reply': '2023-12-01T09:49:16.142901Z'}}, 'outputs': [], 'source': 'mp.close_db()\n'}], 'metadata': {'kernelspec': {'display_name': 'message_ix', 'language': 'python', 'name': 'python3'}, 'language_info': {'name': 'python', 'version': '3.8.18', 'mimetype': 'text/x-python', 'codemirror_mode': {'name': 'ipython', 'version': 3}, 'pygments_lexer': 'ipython3', 'nbconvert_exporter': 'python', 'file_extension': '.py'}, 'vscode': {'interpreter': {'hash': '29605b568787f5559ca1ba05da75d155c3dcfa15a1a63b1885b057c5465ade2e'}}}, 'nbformat': 4, 'nbformat_minor': 2}, 'len-rep-graph')

As we can see, the error arises related to reporting again. It is comparing another expected hardcoded number to an actual number. Looking at our test setup, we are only checking the result of a single cell in this notebook; the cell len-rep-graph should be equal to 13724. I'm not sure how the names of cells are created, but looking at the tutorial in question, my guess is it's this one:

len(rep.graph)

So this seems related, too. My suggestion for this is very similar to the error above: clean up this PR and see what value comes out (e.g. by quickly running the tutorial), then use that to update the expected value.

Dec 05 '23 19:12 glatterf42

This will be postponed together with the originating issue since we don't have enough time for a proper cleanup right now.

Dec 13 '23 14:12 glatterf42

As discussed in the weekly meeting on February 15, @behnam-zakeri will continue this PR. Please let me know if I can assist with anything here.

Mar 21 '24 12:03 glatterf42

@glatterf42 thanks for following up on this in today's meeting. I went through the test and actually the tests are not using Westeros tutorials but building new models for testing this.

I started to test the changes in this PR using Westeros tutorials. I followed the logic below:

Clone a "baseline" (scenario I) to a new "emission bound" scenario (scenario II) and add a binding emission bound to it.
Solve scenario II and calculate emission prices (here PRICE_EMISSION_NEW).
Add the same prices as tax_emission to a clone of "baseline" and call it "emission tax" scenario (scenario III). Check the following: a. Emission prices are equal for scenario II and scenario III (looking into PRICE_EMISSION_NEW) b. Emissions are equal for scenario II and scenario III (looking into EMISS)

I ran this setup for two types of emission bounds, cumulative and yearly, using Westeros tutorials. My observation is that:

In both cases, check (a) passes.
In both cases, check (b) does not pass.

So, my first reaction is that:

Is the above logic for testing this PR correct?
Why shouldn't we use Westeros tutorials for testing this? (In other words, what changes are brought with the new test that the tutorials cannot replicate?)

Maybe @OFR-IIASA and @gidden can help here.

Mar 28 '24 16:03 behnam-zakeri

Hey @behnam-zakeri - thanks so much for looking into this so quickly. I have a comment and two questions:

In principle there's no reason not to test with the tutorials, in fact I think it's a great idea.
Can you check how close the objective functions are between the two runs? They should be within game solution tolerance
Is the emission bound cumulative or year on year? If cumulative, I'd suggest trying year on year. In principle we may have an issue with hotelling calculations for this

Mar 28 '24 18:03 gidden

Dear Behnam,

As there were existing test, the idea was to leave those in place and extend the test examples covered, while also making sure that the results are the SAME as opposed to just "close". The test for "duality" was "close" before, but the allowed tolerance was too large, hence the error in the formulation wasnt caught. Discussing with Volker at the time when changing the formulation for deriving the price, the idea behind the test was to set up a a simple example with a single technology that has emissions and add-on technologies which would the allow for emission reduction. In the end, I never really got to completing the work, hence the tests dont pass, and this was the main task left to do in the PR, i.e. develop a test case that works.

As you rightly point out, there are a few different cases to test, but I have made a few more adjustments below.

TEST 1.: Equal length time periods // CUMULATIVE bound -> Clone a "baseline" (scenario I) to a new "emission bound" scenario (scenario II) and add a binding CUMULATIVE emission bound to it. -> Solve scenario II and calculate emission prices (here PRICE_EMISSION_NEW). -> Check: that calculated emission price calculation matches a manual calculation of the price based on the EMISSION_EQUIVALENCE

TEST 2.: Unequal length time periods // CUMULATIVE bound -> Same as "TEST 1", but with changes in the period duration, so that there are 5 and 10 year timesteps.

TEST 3.: Equal length time periods // PERIOD bound -> Same as "TEST 1", but with period specific bound

TEST 4.: Unequal length time periods // PERIOD bound -> Same as "TEST 2", but with period specific bound

TEST 5.: Test price duality -> Scenario A.) Apply different CUMULATIVE bounds to a scenario with differing timeperiod lengths. -> Scenario B.) Apply the resulting PRICE_EMISSION from scenario A as tax_emission; check:

ensure that EMISS are equal for scenarios A and B
ensure that tax_emission and PRICE_EMISSION_NEW as equal in scenario B

-> Scenario C.) Apply the emission trajectory (EMISS) from scenario A as a period specific bound_emission; check:

ensure that EMISS are equal for scenarios A and C
ensure that tax_emission and PRICE_EMISSION_NEW as equal in scenario C

In the end, we need to ensure that the carbon-price from a cumulative constraint, provides the same emission trajectory per period as the cumulative trajectory when applied as tax_emission (Test 5, scenario B) and that the the emission trajectory per period from the cumulative constraint scenario provides the same carbon price as when applying a cumulative constraint (Test 5, Scenario C).

Apr 02 '24 07:04 OFR-IIASA

@gidden thanks for the reflection. As @OFR-IIASA mentioned, it seems Westeros tutorials are not enough, even though they can be part of the test, because they do not exhibit the change in the model periods, e.g., from 5-year to 10-year. My tests using the tutorial were both for year-by-year and cumulative bounds.

Apr 03 '24 20:04 behnam-zakeri

@OFR-IIASA thanks for the updates and confirming the logic for the tests. I see your point, if we need varying model periods we need to develop a new test model.

Apr 03 '24 20:04 behnam-zakeri

Thanks for continuing the work here, @behnam-zakeri! Any further improvements to reach our goal are much appreciated :) Just wanted to drop by to say: don't worry about the code quality check, I can easily fix that afterwards :)

Apr 11 '24 11:04 glatterf42

@glatterf42 I made the following changes and now this PR is ready for reviewing/improving the tests. Indeed, the existing tests are good enough, but a few of them, i.e., test_price_duality don't pass. Here, @OFR-IIASA can confirm that the changes made in this PR in the GAMS formulation work on the global model, but not on the tests developed here, right?

The changes made today are:

The GAMS formulation is cleaned up from new variable names and extraneous equations.
model.py is cleaned accordingly.
test_feature_price_emission.py is cleaned from hard-coded data and temporary text info.
"Addon" technologies are disabled in the test, as it seems they don't do anything specific but make the test a bit complicated. The tests behave the same as before without "Addon".
Important: For test_price_duality, model periods with equal length are added, and the checks don't pass for them too. So, there is an issue regardless of the model periods being equal or varying between 5 and 10 years.

Apr 11 '24 11:04 behnam-zakeri

Still working on getting test case 5 to work. I've tried using make_westeros, but that has its model years ([700, 710, 720]) hardcoded, so doesn't work out of the box. Instead, I've adapted the model_setup from what @OFR-IIASA started and the tests are starting to run with what look like reasonable results. In order to troubleshoot the tests, I included tests/feature_price_emission.ipynb, which uses a new local DB instance to avoid cluttering existing ones. Run something similar to ixmp platform add test_feature_price_emission jdbc hsqldb /home/fridolin/.local/share/ixmp/localdb/test_feature_price_emission to use the same name or adapt it as you see fit. When I'm running this notebook (which for now just includes evenly-spaced years), I find that whatever variable we're fixing works fine, while the other doesn't. So in 5.1, we fix the tax_emission based on PRICE_EMISSION and this is registered correctly, but the EMISS trajectory is off. In 5.2, we apply a period-specific bound_emission, which leads to the same trajectory, but different prices. In 5.2, the ACT of the technologies seems to be different, while in 5.1, only the the marginals of ACT seem to differ.

This might speak to a problem in the conversion between emission bounds and emission prices.

If anyone has any idea how to troubleshoot this further or even what specifically might cause this problem, I'm happy to take suggestions :)

May 16 '24 12:05 glatterf42

I'm sad to have to postpone this (and #723) beyond 3.9, but it doesn't look like this PR will be ready still this week. We could consider releasing 3.9.1 if this gets merged not too long after our 3.9 release.

May 23 '24 12:05 glatterf42

@behnam-zakeri suggested earlier today that it might be useful to take the price_emission notebook temporarily included here and run it on the main branch to see if it works there.

May 23 '24 12:05 glatterf42

Adjust calculation of PRICE_EMISSION

How to review

PR checklist

Notes on progress