xcdat icon indicating copy to clipboard operation
xcdat copied to clipboard

Adds option to output regridding weights

Open jasonb5 opened this issue 8 months ago • 9 comments

Description

Adds output_weights option to regrid2 and xesmf regridders. Can be set to true/false to include weights as weights variable in output dataset. If it's a str, this will be the name of the variable in the output dataset.

The regrid2 regridder formats it's weights as a sparse array with dimensions (out_y, out_x, in_y, in_x). This matches the behavior of the xesmf regridder. Sparse array is used for space efficiency.

  • Closes #550

Checklist

  • [ ] My code follows the style guidelines of this project
  • [ ] I have performed a self-review of my own code
  • [ ] My changes generate no new warnings
  • [ ] Any dependent changes have been merged and published in downstream modules

If applicable:

  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] New and existing unit tests pass with my changes (locally and CI/CD build)
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

jasonb5 avatar Apr 11 '25 19:04 jasonb5

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (aaf9ec3) to head (b7f8795). Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #752   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           16        16           
  Lines         1722      1765   +43     
=========================================
+ Hits          1722      1765   +43     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Apr 11 '25 19:04 codecov[bot]

@pochedls @lee1043 @chengzhuzhang @tomvothecoder I wanted to add one more feature to this PR but want to get a consensus first. I want to update the grid functions to not just output a dataset with spatial coords/dimensions but also include a mask variable. I'm curious if this makes sense from a user perspective.

jasonb5 avatar Apr 11 '25 19:04 jasonb5

@pochedls @lee1043 @chengzhuzhang @tomvothecoder I wanted to add one more feature to this PR but want to get a consensus first. I want to update the grid functions to not just output a dataset with spatial coords/dimensions but also include a mask variable. I'm curious if this makes sense from a user perspective.

I don't see a near-term need for this (for myself) if the regridder is automatically NaN-ing out missing data in a way that is reasonable. But if the mask is useful for interpreting the regridder behavior and it is easy to include it as an output, I'm good with this.

pochedls avatar Apr 11 '25 20:04 pochedls

I think this function is useful to have for testing purpose. With the function it would be easier to compare weightings of different regridders. This might be also helpful for high resolution data where regridding can take longer than coarser grid. A following question would be, can the regridder accept re-use of pre-calculated weights? I support this addition unless it significantly drop the performance (which I think it is very unlikely). Thank you for the addition, @jasonb5!

lee1043 avatar Apr 11 '25 22:04 lee1043

I forgot to add, make sure to update pyproject.toml dependencies too:

https://github.com/xCDAT/xcdat/blob/4f6313ed7bba641d8b32e5ec5f3aedf0009e558b/pyproject.toml#L25-L36

tomvothecoder avatar Apr 14 '25 16:04 tomvothecoder

I'm curious, what are some scenarios where pre-calculated weights might be used/preferred over internally generated weights?

@pochedls and @chengzhuzhang can weigh in on the usefulness of this enhancement.

I don't know if I would use this frequently, but I could see it being useful and a reasonable feature to include. I think if this is a matter of just adding this as an input/output, then it isn't a big deal to add, but if it requires more work it might make sense to tackle this on a longer time horizon with a specific issue/PR.

pochedls avatar Apr 14 '25 17:04 pochedls

I'm curious, what are some scenarios where pre-calculated weights might be used/preferred over internally generated weights? @pochedls and @chengzhuzhang can weigh in on the usefulness of this enhancement.

I don't know if I would use this frequently, but I could see it being useful and a reasonable feature to include. I think if this is a matter of just adding this as an input/output, then it isn't a big deal to add, but if it requires more work it might make sense to tackle this on a longer time horizon with a specific issue/PR.

Jill mentioned in our 1:1 meeting that it might be performant to re-use the internally generated weights (outputted via this PR) for variables on the same grid.

I think xCDAT's xESMF regridder already supports this feature by inheriting xESMF's weights parameter to use custom weights (link). I have not tested this yet.

I suggest we open a separate GitHub issue and PR to tackle this enhancement for Regrid2 if we think it will be useful.

tomvothecoder avatar Apr 14 '25 18:04 tomvothecoder

I'm going to go ahead and add the mask variable to the grid dataset returned from ds.regridder.grid property. This will simplify a few lines of code in this PR.

As for re-using weights as Tom mentioned xESMF already supports this and it would be easy to add for Regrid2. I can create an new issue to discuss this.

jasonb5 avatar Apr 14 '25 19:04 jasonb5

I'm going to go ahead and add the mask variable to the grid dataset returned from ds.regridder.grid property. This will simplify a few lines of code in this PR.

As for re-using weights as Tom mentioned xESMF already supports this and it would be easy to add for Regrid2. I can create an new issue to discuss this.

Thanks @jasonb5. This PR looks like it'll be good to go after you add the mask variable with tests.

Once you merge, I can proceed with releasing v0.9.0.

tomvothecoder avatar Apr 14 '25 20:04 tomvothecoder

@tomvothecoder Is it fine to merge now?

jasonb5 avatar May 23 '25 20:05 jasonb5

Hey @jasonb5, I had some review comments. I think we need to add the same create_nan_mask logic in the Regrid2 code to the xESMF regridder too. That way xESMF can use the correct "mask" variable from the source variable for more accurate regridding. Right now it just transposes any existing "mask" variable.

Once this is implemented with tests, you can merge.

This is fine, I can add it now. Didn't know if we wanted to address this in a later PR.

jasonb5 avatar May 27 '25 16:05 jasonb5

@tomvothecoder This is done once the CI passes.

jasonb5 avatar May 29 '25 23:05 jasonb5

I rebased this branch on main, fixed docstrings and type annotations, and added a suggestion from GH Copilot (here).

I will merge now. Thanks for the work, @jasonb5!

tomvothecoder avatar Jun 02 '25 18:06 tomvothecoder