Helper functions to format trait data from external databases (e.g. TRY) for PEcAn.MA
I'd rephase this. The MA runs on tabular data in a specific format. One way to easily get data in that format is to query BETYdb, but you can also generate that format manually if you have other trait data. Indeed, a great new Issue for first time PEcAn developers would be create a helper function(s) that reformats trait data from common trait databases (e.g. TRY) into the tabular format this module is expecting. Given that no one is actively updating BETY, this approach is probably going to be the de facto norm for most users, and in the future there should be an update to this demo once we have functions that enable this
Originally posted by @mdietze in https://github.com/PecanProject/pecan/pull/3707#discussion_r2598184848
Create helper functions to reformat trait data from common external databases (like TRY) into the tabular format expected by the meta-analysis module. This would allow users to easily use data sources other than BETYdb.
I'll add that #3601 provides more info about how the flat files need to be formatted for the MA. A good test case would be the crop C and N trait data that @mattykim06 has been working with for PR #3683, as there's a very real need to set new SIPNET N cycle parameters
Hi @mdietze @AritraDey-Dev, I'm interested in working on this TRY database helper function.
From the description, I understand we need:
- A function to convert TRY database format to PEcAn MA tabular format
- Standard trait name mappings
- Documentation and examples
Before I start implementing, could you clarify:
- Are there any existing examples of the expected PEcAn MA format?
- Should the function handle specific edge cases in TRY data (ranges, qualifiers)?
- Are there any existing tests for data formatting I should follow?
Can you assign this issue to me?
Are there any existing examples of the expected PEcAn MA format?
https://github.com/PecanProject/pecan/blob/09ea6c9da2d82cf0ec31215ea127c56a93441345/base/db/R/query.data.R#L25-L37 This shows the exact SQL columns returned from BETYdb.You can also refer to this test also for an example of the expected data format.
Should the function handle specific edge cases in TRY data (ranges, qualifiers)?
yes,see this https://github.com/PecanProject/pecan/blob/develop/modules/meta.analysis/R/jagify.R#L46-L55, https://github.com/PecanProject/pecan/blob/develop/modules/meta.analysis/R/jagify.R#L70-L88 for example.
Are there any existing tests for data formatting I should follow?
You can refer to the this files test.jagify.R, test.query.data.R
Done Started working on it.
@AritraDey-Dev I've created the helper function format_try_for_ma() to convert TRY database data into PEcAn's meta-analysis format.
The function will be located at: pecan/modules/data.remote/R/format_try.R
Function overview:
Maps TRY columns (StdValue, TraitName, Replicates, ErrorRisk) to PEcAn MA format
Handles basic column renaming and structure
Includes parameters for trait name mapping and citation ID
Usage example:
try_data <- data.frame( TraitName = c("Leaf nitrogen content per leaf dry mass"), StdValue = c(2.5), ErrorRisk = c(0.5), Replicates = c(10) )
trait_map <- c("Leaf nitrogen content per leaf dry mass" = "leaf_N_concentration") result <- format_try_for_ma(try_data, trait_map, citation_id = 999)
Next steps I need help with:
Could you provide a sample of actual TRY database export data to test with real values? How should I handle species mapping from TRY's AccSpeciesName to PEcAn's specie_id? Are there specific edge cases in TRY data (like ranges, qualifiers) I should prioritize handling?
I'm ready to refine this based on feedback and add more functionality once I can test with real TRY data.