sgn
sgn copied to clipboard
Phenotype Download returns oldest observation if there are repeat measures
Expected Behavior
If there are repeat measures for a specific observation unit/trait combo, then the phenotype download will return the oldest observation as determined by the collection date of the observation stored in the database. This is inconsistent with a user's expectation that the latest observation be returned in the file.
This overlaps with issue https://github.com/solgenomics/sgn/issues/3630
Steps to Reproduce
- Collect observations with timestamps
- Upload those observations to a trial
- Collect new (and different) observations for the same traits/observation units with different timestamps
- Upload the second round of observations
- Download the phenotype data for the trial
- Verify that the value of the observation unit/trait combo is equal to the first observation uploaded
Repeat measurements can only be solved if we store additional metadata with each variable. Proposal: a new variable property, repeat_type
, defines how the variable behaves for multiple measurements. Possible repeat_types
could be single
, multiple
, and time_series
. When single
is associated with a trait, it will overwrite the last trait value. If multiple
is associated with the trait, values will continue to be added, including time stamps. Averages can still be made over all the values, as they all represent the same phenotype. For time_series
, multiple measurements are recorded, but the resulting data cannot be averaged, so on the trial detail page no average will be shown for them. It can be displayed as a growth curve, or used to derive growth parameters, etc.
Variables with multiple
and time_series
associations, when downloaded in spreadsheet format, could be stored in one cell separated by a delimiter, such as |. In the extended format, the measurement timestamp is given in parenthesis, so an example cell could contain 56(2024-01-20 12:00)|68(2024-01-21)|63(2024-01-27)
For multiple
type variables, downloads could also average these numbers upon request for easier downstream analysis
I wanted to follow-up on this issue and see where things stand with being able to download data that has multiple values from the database? I can see the values under raw data within a trial, but download still includes only the most recent value.
Working on it. Should be available in a few weeks
Get Outlook for iOShttps://aka.ms/o0ukef
From: Heather Manching @.> Sent: Monday, July 1, 2024 4:34:32 PM To: solgenomics/sgn @.> Cc: Lukas A. Mueller @.>; Assign @.> Subject: Re: [solgenomics/sgn] Phenotype Download returns oldest observation if there are repeat measures (Issue #4419)
I wanted to follow-up on this issue and see where things stand with being able to download data that has multiple values from the database? I can see the values under raw data within a trial, but download still includes only the most recent value.
— Reply to this email directly, view it on GitHubhttps://github.com/solgenomics/sgn/issues/4419#issuecomment-2201708502, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAAV7F4XBZEE5QU5H2XQHMTZKIGTRAVCNFSM6AAAAAAUPJLPGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBRG4YDQNJQGI. You are receiving this because you were assigned.Message ID: @.***>