sgn icon indicating copy to clipboard operation
sgn copied to clipboard

Repeat observation not allowed in PUT brapi/v1/observations anymore

Open ctucker3 opened this issue 4 years ago • 8 comments

There was a changed in this PR:

https://github.com/solgenomics/sgn/commit/b3399347190f7c66142a96cea8ea08068e3d9d74

So that repeat observations are now ignored. Not sure if this was an intentional change, but it would be useful to still be able to store repeat observations. @bellerbrock Do you remember if you had a reason for that change?

ctucker3 avatar Jul 20 '21 17:07 ctucker3

it affects v2 /observations also

MFlores2021 avatar Jul 20 '21 17:07 MFlores2021

Also, the guilty line is 568.

ctucker3 avatar Jul 20 '21 19:07 ctucker3

needs more discussion :-) Is this a feature or a bug? Others have requested upload of multiple observations per trial and trait... Do not allow values with the same timestamp though The problem is the download, it assumes one value per trial/trait, so that needs to be fixed

lukasmueller avatar Jul 22 '21 15:07 lukasmueller

Not sure its original intention, but the observations parse method in repos/sgn/lib/CXGN/Phenotypes/ParseUpload/Plugin/Observations.pm only throws an error for non-unique, observation unit, variable, timestamp combos. So it seems like if it started as a bug, somewhere along the line it turned into a feature.

ctucker3 avatar Jul 26 '21 17:07 ctucker3

Definitely a complicated one. Historically, multiple measurements over time has usually been handled via trait ontologies. First built into the ontology itself (CBSD 3-month eval, 6-month eval, 9-month eval, etc.), then factored out into independent time ontologies (day1-day365, week1-week52, month1-month12 ) that can be combined with the core ontology via 'Compose a new trait'.

Until recently most data has not had timestamps attached, and from what I can tell the default behavior (that repeat observations of the same trait would be stored if the user didn't chose to overwrite) was an oversight/long-running bug. It caused strange behavior like negative values in the summary stats %missing column on trial detail pages, and was not accommodated for by any analysis tools or download formats.

I like the idea of handling things differently though. Seems like there's a lot of demand for storing repeated measurements differentiated just via timestamp, especially with the fieldbook -breedbase api connection making observation submissions with timestamps so easy. It'll mean lots of changes though, to upload parsers, summary and analysis tools, and download formats.

Lukas is back at then end of next week, maybe we can discuss in more detail then?

bellerbrock avatar Aug 04 '21 17:08 bellerbrock

Sounds like a good idea. Let me know when works better

MFlores2021 avatar Aug 09 '21 22:08 MFlores2021

Features from the discussion today:

Upload:

  • Separate upload phenotypes tool into 'upload phenotypes' and 'update phenotypes', analagous to BrAPI /observations POST and PUT, respectively.
  • Improve data checks. Similar to diagram below but check for uniqueness of observationunit+ traitname + timestamp before checking value

upload decisions

Display:

  • Fix % missing calulations in summary stats table to handle multiple observations
  • Add additional summary features (table with timestamps? time series visualization?) to display multiple observations.

Download:

  • Add option to existing download format to extract single value from multiple observations. Choice to calculate avg, or use first value or latest value
  • Add new download format consistent with fieldbook database format (column for observationunits, column for trait names, column for values, column for timestamps, column for operators)
  • Explore additional download formats. Maybe append '_timestamp' to trait names in column header and append timestamp value to trait value in the spreadsheet cell, with multiple measurements separated by semicolons?

bellerbrock avatar Aug 25 '21 20:08 bellerbrock

@lukasmueller I uploaded a phenotype file (in fieldbook export format) that had multiple timestamps. I realized it only kept the data with the first timestamp. I went back and split the data by date to upload individually and I was able to upload the second timestamp data okay, however, the 3rd timestamp through an error (see attachment) and I can not upload anymore with out this error. I also still can't delete trials or trait data in the database, so I can't go back and try again. I think this relates to this issue. Screenshot 2024-07-04 at 11 20 40 AM

hkmanching avatar Jul 04 '24 15:07 hkmanching