pecan icon indicating copy to clipboard operation
pecan copied to clipboard

Add IC_SOILGRID_Utilities for SoilGrids Data Processing

Open divine7022 opened this issue 8 months ago • 4 comments

Description

This pull request introduces a new utility script, IC_SOILGRID_Utilities.R, designed to facilitate the processing of SoilGrids data for generating soil carbon initial condition (IC) files. The new script provides essential functions for extracting, processing, and generating ensemble members from SoilGrids250m data. Key changes include the implementation of general get.site.info function, which extracts site information from various input formats.

New Functionality:

1) Implemented general and reusable get.site.info function to standardize site information handling:

  • Extracts data from Settings, MultiSettings, or flexible enough with CSV inputs too.
  • Validates required fields (site_id, lat, lon) and data types.
  • Supports both single sites and vectorized site data.
  • Includes coordinate validation with strict/lenient modes.
  • Returns consistent data frame output (site_id, name, lat, lon, str_id).
  • Testing:
    • Developed a comprehensive test suite for the get.site.info function, covering various scenarios such as settings objects, CSV files, MultiSettings objects, and vectorized site information.
    • Ensured that all tests pass successfully, validating the correctness of the new functionality.

2) soilgrids_ic_process:

  • A comprehensive function that processes SoilGrids data to create initial condition files. It extracts soil carbon data, handles missing values, generates ensemble members, and writes output to NetCDF files (End-to-end workflow from input to NetCDF).
  • Supports input from both PEcAn settings lists and optional CSV files for site information.
  • Includes parameters for output directory management, file overwriting, and verbosity for detailed logging.

3) preprocess_soilgrids_data:

  • A helper function that preprocesses the raw soil carbon data, handling missing values and ensuring data integrity.
  • Implements logic for scaling and defaulting values where necessary.

4) generate_soilgrids_ensemble:

  • A function that generates ensemble members for a given site based on processed soil carbon data, ensuring reproducibility through random seed setting.
  • Uses truncated normal distribution (no negative values).
  • Seed set by site_id for deterministic results.
  • Configurable ensemble size.

Motivation and Context

Review Time Estimate

  • [ ] Immediately
  • [ ] Within one week
  • [ ] When possible

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • [ ] My change requires a change to the documentation.
  • [ ] My name is in the list of CITATION.cff
  • [ ] I agree that PEcAn Project may distribute my contribution under any or all of
    • the same license as the existing code,
    • and/or the BSD 3-clause license.
  • [ ] I have updated the CHANGELOG.md.
  • [ ] I have updated the documentation accordingly.
  • [ ] I have read the CONTRIBUTING document.
  • [ ] I have added tests to cover my changes.
  • [ ] All new and existing tests passed.

divine7022 avatar Apr 11 '25 00:04 divine7022

Hi @dlebauer @infotroph Could you please review this PR when you have a moment?

Thank you!

divine7022 avatar Apr 11 '25 01:04 divine7022

I'll add tests for IC_SOILGRID_Utilities after addressing your feedback. Currently, this PR only includes tests for get.site.info.

divine7022 avatar Apr 11 '25 01:04 divine7022

  1. Unclear where these functions fit in the larger workflow, as I don't see any changes to any of the ic process functions that would be calling this (and merging it with aboveground data). Would be good to check that any function, and function argument, naming conventions used here are consistent with conventions assumed within the larger ic process workflow to ensure interoperability and polymorphism.
  2. I'd really like @Qianyuxuan and @DongchenZ to review this before it is approved as a lot of this feels redundant with all the existing SoilGrids code they have developed. It's not obvious to me how this work fits with prior SoilGrids code or with prior SOC IC code.
  3. bookdown documentation not updated

mdietze avatar Apr 11 '25 11:04 mdietze

Hello sir, I’ve addressed the previous concerns in my comments and would value your thoughts on the updated approach. Could you please take a look whenever you have a moment.
Thank you!

divine7022 avatar Apr 17 '25 04:04 divine7022

@divine7022 could you please resolve conflicts?

@mdietze and @DongchenZ have all of your suggestions / requested changes been addressed?

dlebauer avatar Aug 04 '25 21:08 dlebauer