Please confirm that this PR has done the following:

[x] Tests Added
[ ] Documentation Added
[ ] Name of contributors Added to AUTHORS.rst
[ ] Description in RELEASE_NOTES.md Added

Adding to RELEASE_NOTES.md (remove section after adding to RELEASE_NOTES.md)

Please add a single line in the release notes similar to the following:

- (#XX)[http://link-to-pr.com] Add read_netcdf() function

Description of PR

This PR adds a read_netcdf() function to read netcdf data (.nc). Note that this function currently does not support sub-annual data and meta indicators have the exact indices from META_IDX (currently ‘model’ and ‘scenario’).

Sep 05 '24 15:09 LinhHo

Thanks a lot for your comments! To answer your question about aggregated Calliope netCDF, the current output from Calliope (“standard” netCDF) has five dimensions: nodes, techs, carriers, timesteps, and costs. They contain variables according to the model’s specifications, e.g., flow cap, flow in, flow out, flow export, etc.

Aggregated Calliope netCDF (the input data for pyam.read_netcdf) are aggregated or disintegrated from standard Calliope netCDF files into variables compatible with the IAMC standard, such as Primary Energy | Coal. They have four dimensions: model, scenario, region, and time. Unit is an attribute of each variable. In the future, Calliope will be updated to also produce aggregated netCDF as an optional output.

I will be on vacation for the next two weeks and will address any further comments from you when I'm back. Best regards, Linh

Sep 29 '24 20:09 LinhHo

Sorry for the delay in the review, due to your and my vacation and then too many other ongoing activities...

Nov 12 '24 07:11 danielhuppmann

Pinging @coroa @Rlamboll @znicholls, @LinhHo started implementation of a read_netcdf method in pyam. Could you take a look at whether the nc file that she is using is structured in a way that is useful for the IAMC-format domain (including datetime support)?

Nov 12 '24 07:11 danielhuppmann

There's lots of ways to do something like this with netCDF (similar to CSV, lots of options). The trick is more about just picking something that works.

If all you're doing is reading data, then it doesn't really matter. I would probably give the method a more specifc name. E.g. read_netcdf_caliope, because you don't have an equivalent write_netcdf function that makes it more obvious what format is expected.

There are probably some performance things you could improve in the PR, but if you test that it can read data with time columns and test that it can read data with year columns, you're probably good to go and you can fix performance later.

If you want to also support reading more arbitrary netCDF and writing netCDF, read this

The first thing to understand is that there isn't a standard netCDF format. People do crazy things, so supporting general netCDF isn't really a thing. If you have a write_netcdf method though, it's easier to be like, "We support reading netCDF written with write_netcdf or equivalents thereof.".

The thing we found difficult in scmdata was supporting arbitrary metadata columns when writing. That makes your life much trickier. If you don't care about capturing the meta table, that would make things simpler.

The other thing that can be tricky is the sparsity of data. If you want to store lots of model-scenario combinations, but not all models have run all scenarios, depending on how you format the data, you can end up with a lot of nans which can be annoying for memory or data storage/handling.

If you want the full details, our docs go into this sparsity issue here: https://scmdata.readthedocs.io/en/latest/notebooks/netcdf.html

In terms of how we implemented it, handling all our use cases was tricky (see https://github.com/openscm/scmdata/blob/main/src/scmdata/netcdf.py and https://github.com/openscm/scmdata/blob/main/src/scmdata/_xarray.py). You could do something much simpler if you make more restrictions than we did.

Nov 12 '24 09:11 znicholls

Thanks for the comments! Indeed, there is no standard netCDF format. This function aims to read a netCDF format that is compatible with the IAMC convention and can be read easily into pyam.IamDataFrame format. I wrote a description of how the netCDF input file should be structured for the function read_netcdf to work properly, please see this Jupyter notebook. Since anyone can prepare their data in this netCDF format to use read_netcdf, and there is nothing in it specifically for Calliope, I would say read_netcdf_calliope is not necessary.

@danielhuppmann please let me know if putting the Jupyter notebook in the docs/tutorials is appropriate. I was not sure where to put the documentation. Thank you!

Dec 03 '24 23:12 LinhHo

please see this Jupyter notebook

Looks very nice!

Dec 04 '24 07:12 znicholls

Awesome, thanks @LinhHo !!

Dec 04 '24 07:12 gidden

Looks like you found a very inconvenient corner case... https://docs.xarray.dev/en/stable/user-guide/io.html

I suggest the following: replace all 'None's in string-columns in meta by "NaN" when writing to netcdf, and then replace them again when reading from file.

Dec 10 '24 21:12 danielhuppmann

@LinhHo, I think with the quickfix I proposed above (replacing "nan"-strings by np.nan and a little bit of clean-up (like adding a readme, removing the print statement), this should be good to go, right?

Dec 13 '24 10:12 danielhuppmann

@danielhuppmann Yes, I think so. Thanks a lot for all the comments! I have just updated the release note and cleaned up the code. Could you have an overall look again if we can finalise this PR, please? By readme did you mean a documentation like this Jupyter notebook? https://github.com/LinhHo/pyam/blob/netcdf/docs/tutorials/read_netcdf.ipynb

Dec 13 '24 12:12 LinhHo

readme did you mean a documentation like this Jupyter notebook?

Sorry, I meant release notes. Which you have already added but in the wrong place.

Dec 13 '24 13:12 danielhuppmann

Tests are failing because the lock file isn't up to date and the release-note ist still in the wrong place - I made https://github.com/LinhHo/pyam/pull/1 to fix this for you. Please review and approach that PR, then tests for this PR should pass.

Dec 13 '24 14:12 danielhuppmann

I took the liberty of pushing an update of the lock file directly to your branch

Dec 13 '24 16:12 danielhuppmann

Looks like tests are failing because matplotlib published a new release, so the plots generated in the tests do not exactly match the expected figures. I will take care of that next week.

Dec 13 '24 16:12 danielhuppmann

Codecov Report

Attention: Patch coverage is 81.81818% with 8 lines in your changes missing coverage. Please review.

Project coverage is 95.0%. Comparing base (ddbb88e) to head (6b73a54). Report is 46 commits behind head on main.

Files with missing lines	Patch %	Lines
pyam/netcdf.py	79.4%	8 Missing :warning:

Additional details and impacted files

@@          Coverage Diff           @@
##            main    #879    +/-   ##
======================================
  Coverage   95.0%   95.0%            
======================================
  Files         64      65     +1     
  Lines       6134    6278   +144     
======================================
+ Hits        5828    5965   +137     
- Misses       306     313     +7

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Dec 14 '24 12:12 codecov[bot]

I realized that if I fix the matplotlib expected-plots in a separate branch, that would again create merge conflicts on poetry.lock. So I added the expected plots to this branch.

Dec 14 '24 12:12 danielhuppmann

Add read_netcdf function

Please confirm that this PR has done the following:

Adding to RELEASE_NOTES.md (remove section after adding to RELEASE_NOTES.md)

Description of PR

Codecov Report