[PR]: Enable `skipna` for spatial and temporal mean operations
Description
Enable skipna parameter for mean operation.
- Closes #582
Checklist
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my own code
- [x] My changes generate no new warnings
- [x] Any dependent changes have been merged and published in downstream modules
If applicable:
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass with my changes (locally and CI/CD build)
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [ ] I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 100.00%. Comparing base (
595845f) to head (5d76c61). Report is 1 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #655 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 16 16
Lines 1680 1681 +1
=========================================
+ Hits 1680 1681 +1
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Test code attached: Archive.zip
Thanks for this PR @lee1043! I will more closely either some time at the end of this week or early next week.
@lee1043 – Thanks for taking on this PR. This seems like a handy option to have in xcdat. I just took a quick look at this – it seems like skipna should be added to the docstring (since it is now available in the public API). I'm also curious if you can copy a couple existing unit tests, add a np.nan value somewhere, and ensure that the NaN value propagates to the final result (I think this is the expected behavior).
@pochedls thanks for the suggestion. I will work on these and ping you and @tomvothecoder for review once ready.
I'm assuming that we want to add the skipna arg to group_average(), climatology(), and departures() (in addition to average()). Is this correct @lee1043 and @pochedls? I made these updates in this commit 7f9d2e3 (#655) along with adding unit tests.
I'm assuming that we want to add the
skipnaarg togroup_average(),climatology(), anddepartures()(in addition toaverage()). Is this correct @lee1043 and @pochedls? I made these updates in this commit7f9d2e3(#655) along with adding unit tests.
I think so. This probably needs to be tested on real data (I can't think of unintended consequences, but it is worth stress testing a little).
I'm assuming that we want to add the
skipnaarg togroup_average(),climatology(), anddepartures()(in addition toaverage()). Is this correct @lee1043 and @pochedls? I made these updates in this commit7f9d2e3(#655) along with adding unit tests.I think so. This probably needs to be tested on real data (I can't think of unintended consequences, but it is worth stress testing a little).
Can you upload one of your model validation scripts in the xcdat-validation repo? I would like to re-use it for testing.
Getting back to this. I don't have an appropriate validation script for this, but may be able to help with this if there is an obvious minimum validation example. For example, is there an equivalent set of CDAT commands that xcdat should be able to match here? I think we would need some reference code to develop a larger model validation script.
11/6/24 Testing Requirements
Test skipna which skips missing values (as marked by NaN). It should only skip missing values for float dtypes.
The average should be calculated using non-NaN values or just return NaN for the average when there are all NaN values in a group being averaged over.
Prerequisites
- [x] Review unit tests and update as needed
- [x] Find equivalent set of CDAT commands to test against
skipnafeature- CDAT relies on NumPy masked arrays, which means averaging routines automatically ignore masked (ie.e, invalid or missing) values. There isn't a separate
skipnaparameter.
- CDAT relies on NumPy masked arrays, which means averaging routines automatically ignore masked (ie.e, invalid or missing) values. There isn't a separate
- [ ] Write a test script that compares xCDAT averaging APIs against CDAT. Use the same test datasets (https://github.com/xCDAT/xcdat-data) with missing values either existing or inserted. -- IN PROGRESS
- APIs to test:
spatial.spatial_avg(),temporal.average(),temporal.group_average(),temporal.climatology(),temporal.departures()
- APIs to test:
Results
TBD
Just rebased this branch on the latest main. Please update your local branch accordingly.
I will re-test this branch with the test script I made earlier.
Confirmed that the skipna capability is working as expected for a sample and real world datasets. Details in the following test notebook: test_skipna_20250409.pdf
Updated skipna test:
- xCDAT test with a real world data: test_skipna_20250409.pdf
- Comparison with CDAT: test_skipna_20250409-cdms2-test.pdf
Conclusion: the skipna parameter is working as expected. CDAT uses skipna=True approach, which is consistent to the xCDAT result from this PR.
Updated skipna test:
* xCDAT test with a real world data: [test_skipna_20250409.pdf](https://github.com/user-attachments/files/19673209/test_skipna_20250409.pdf) * Comparison with CDAT: [test_skipna_20250409-cdms2-test.pdf](https://github.com/user-attachments/files/19673210/test_skipna_20250409-cdms2-test.pdf)Conclusion: the
skipnaparameter is working as expected. CDAT usesskipna=Trueapproach, which is consistent to the xCDAT result from this PR.
Thank you Jiwoo! I will do a final review and merge soon.
Thank you Jiwoo! I will do a final review and merge soon.
@tomvothecoder thanks! the notebooks were added to the validation repo, see https://github.com/xCDAT/xcdat-validation/pull/58