xcdat icon indicating copy to clipboard operation
xcdat copied to clipboard

[PR]: Enable `skipna` for spatial and temporal mean operations

Open lee1043 opened this issue 1 year ago • 9 comments

Description

Enable skipna parameter for mean operation.

  • Closes #582

Checklist

  • [x] My code follows the style guidelines of this project
  • [x] I have performed a self-review of my own code
  • [x] My changes generate no new warnings
  • [x] Any dependent changes have been merged and published in downstream modules

If applicable:

  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] New and existing unit tests pass with my changes (locally and CI/CD build)
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [x] I have made corresponding changes to the documentation
  • [ ] I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

lee1043 avatar May 31 '24 08:05 lee1043

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (595845f) to head (5d76c61). Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #655   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           16        16           
  Lines         1680      1681    +1     
=========================================
+ Hits          1680      1681    +1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar May 31 '24 08:05 codecov[bot]

Test code attached: Archive.zip

lee1043 avatar May 31 '24 08:05 lee1043

Thanks for this PR @lee1043! I will more closely either some time at the end of this week or early next week.

tomvothecoder avatar Jun 03 '24 16:06 tomvothecoder

@lee1043 – Thanks for taking on this PR. This seems like a handy option to have in xcdat. I just took a quick look at this – it seems like skipna should be added to the docstring (since it is now available in the public API). I'm also curious if you can copy a couple existing unit tests, add a np.nan value somewhere, and ensure that the NaN value propagates to the final result (I think this is the expected behavior).

pochedls avatar Jun 03 '24 18:06 pochedls

@pochedls thanks for the suggestion. I will work on these and ping you and @tomvothecoder for review once ready.

lee1043 avatar Jun 06 '24 00:06 lee1043

I'm assuming that we want to add the skipna arg to group_average(), climatology(), and departures() (in addition to average()). Is this correct @lee1043 and @pochedls? I made these updates in this commit 7f9d2e3 (#655) along with adding unit tests.

tomvothecoder avatar Jun 24 '24 17:06 tomvothecoder

I'm assuming that we want to add the skipna arg to group_average(), climatology(), and departures() (in addition to average()). Is this correct @lee1043 and @pochedls? I made these updates in this commit 7f9d2e3 (#655) along with adding unit tests.

I think so. This probably needs to be tested on real data (I can't think of unintended consequences, but it is worth stress testing a little).

pochedls avatar Jun 24 '24 17:06 pochedls

I'm assuming that we want to add the skipna arg to group_average(), climatology(), and departures() (in addition to average()). Is this correct @lee1043 and @pochedls? I made these updates in this commit 7f9d2e3 (#655) along with adding unit tests.

I think so. This probably needs to be tested on real data (I can't think of unintended consequences, but it is worth stress testing a little).

Can you upload one of your model validation scripts in the xcdat-validation repo? I would like to re-use it for testing.

tomvothecoder avatar Jul 30 '24 21:07 tomvothecoder

Getting back to this. I don't have an appropriate validation script for this, but may be able to help with this if there is an obvious minimum validation example. For example, is there an equivalent set of CDAT commands that xcdat should be able to match here? I think we would need some reference code to develop a larger model validation script.

pochedls avatar Aug 21 '24 20:08 pochedls

11/6/24 Testing Requirements

Test skipna which skips missing values (as marked by NaN). It should only skip missing values for float dtypes.

The average should be calculated using non-NaN values or just return NaN for the average when there are all NaN values in a group being averaged over.

Prerequisites

  • [x] Review unit tests and update as needed
  • [x] Find equivalent set of CDAT commands to test against skipna feature
    • CDAT relies on NumPy masked arrays, which means averaging routines automatically ignore masked (ie.e, invalid or missing) values. There isn't a separate skipna parameter.
  • [ ] Write a test script that compares xCDAT averaging APIs against CDAT. Use the same test datasets (https://github.com/xCDAT/xcdat-data) with missing values either existing or inserted. -- IN PROGRESS
    • APIs to test: spatial.spatial_avg(), temporal.average(), temporal.group_average(), temporal.climatology(), temporal.departures()

Results

TBD

tomvothecoder avatar Nov 06 '24 23:11 tomvothecoder

Just rebased this branch on the latest main. Please update your local branch accordingly.

tomvothecoder avatar Nov 06 '24 23:11 tomvothecoder

I will re-test this branch with the test script I made earlier.

lee1043 avatar Apr 09 '25 15:04 lee1043

Confirmed that the skipna capability is working as expected for a sample and real world datasets. Details in the following test notebook: test_skipna_20250409.pdf

lee1043 avatar Apr 09 '25 16:04 lee1043

Updated skipna test:

Conclusion: the skipna parameter is working as expected. CDAT uses skipna=True approach, which is consistent to the xCDAT result from this PR.

lee1043 avatar Apr 09 '25 19:04 lee1043

Updated skipna test:

* xCDAT test with a real world data: [test_skipna_20250409.pdf](https://github.com/user-attachments/files/19673209/test_skipna_20250409.pdf)

* Comparison with CDAT: [test_skipna_20250409-cdms2-test.pdf](https://github.com/user-attachments/files/19673210/test_skipna_20250409-cdms2-test.pdf)

Conclusion: the skipna parameter is working as expected. CDAT uses skipna=True approach, which is consistent to the xCDAT result from this PR.

Thank you Jiwoo! I will do a final review and merge soon.

tomvothecoder avatar Apr 09 '25 19:04 tomvothecoder

Thank you Jiwoo! I will do a final review and merge soon.

@tomvothecoder thanks! the notebooks were added to the validation repo, see https://github.com/xCDAT/xcdat-validation/pull/58

lee1043 avatar Apr 09 '25 19:04 lee1043