isa-api icon indicating copy to clipboard operation
isa-api copied to clipboard

Add tracking to data file type column names 2.

Open ptth222 opened this issue 11 months ago • 9 comments

Rebasing the original branch for this did not go well, so I manually created another one. This PR should replace #510. I was able to get this to pass all of the tests and now multiple ".* File" columns won't cause an issue. We don't need this capability like I thought we did originally, but I figured I would go on and make it work anyway.

ptth222 avatar Mar 17 '24 01:03 ptth222

Coverage Status

coverage: 81.341% (+0.08%) from 81.257% when pulling e50e65e6c8eb29706560e5b820de181ad07db9f2 on ptth222:fix-data-file-name-bug2 into 7d5f19f9c7d2738849b2ac62e6d3c71603da8151 on ISA-tools:issue-511.

coveralls avatar Mar 17 '24 01:03 coveralls

as discussed today:

  • [ ] ISA Tab specification allow only 1 "Raw Data File" per Assay Table
  • [ ] "Assay Name" is needed as "Raw Data File" field can not stand alone. To ensure output "Assay Name" is output at serialization time, the protocol type should be data acquisition or a synonym has been added to the ./resources/yaml/protocol-types.yml" file
  • [ ] similarly, "Derived Data File" can not stand alone and should be associated with a "Data Transformation Name" field, which is output when serializing to ISA-Tab if a protocol type "data transformation" is found or a string declared as synonym to "data transformation" in the ./resources/yaml/protocol-types.yml" file
  • [ ] There can be more than one occurrence of "Derived Data File"
  • [ ] "Assay Name" header is the default value found associated with "Raw Data File", but 'specialization' exist:

"NMR Assay Name" with "Free Induction Decay Data File" with 'protocol type' = "NMR spectroscopy" (or declared synonyms) "MS Assay Name" with "Raw Spectral Data File" with 'protocol type' = "mass spectrometry" (or declared synonyms)

proccaserra avatar Mar 18 '24 21:03 proccaserra

I did an example more like what was discussed in the meeting. I had to also add tracking for the columns that are added for known protocol types. The problem that this PR solves is that for any ".* File" column, if there is more than 1 then the values don't get written out correctly. Solving this wasn't as easy as it was for a similar issue with Protocol REF columns. Due to how graph nodes were previously handled.

Old Table Output:

Sample Name Protocol REF MS Assay Name Raw Data File Protocol REF Data Transformation Name Derived Data File Protocol REF Data Transformation Name Derived Data File
sample1 protocol1 process1 datafile1.raw protocol2 process3 datafile3.raw protocol3 process3 datafile3.raw
sample2 protocol1 process4 datafile4.raw protocol3 process5 datafile5.raw process5 datafile5.raw

New Table Output:

Sample Name Protocol REF MS Assay Name Raw Data File Protocol REF Data Transformation Name Derived Data File Protocol REF Data Transformation Name Derived Data File
sample1 protocol1 process1 datafile1.raw protocol2 process2 datafile2.raw protocol3 process3 datafile3.raw
sample2 protocol1 process4 datafile4.raw protocol3 process5 datafile5.raw

ptth222 avatar Mar 21 '24 08:03 ptth222

PRS to test this situation, i.e., several files generated by one data acquisition:

Sample Name Protocol REF MS Assay Name Raw Data File Protocol REF Data Transformation Name Derived Data File Protocol REF Data Transformation Name Derived Data File
sample1 protocol1 process1 datafile1.raw protocol2 process2 datafile2.raw protocol3 process3 datafile3.raw
sample1 protocol1 process1 datafile4.raw protocol2 process2 datafile2.raw

proccaserra avatar Mar 22 '24 11:03 proccaserra

I took a small break from what I got pulled away to work on to address the comments raised when we last met. I added comments and broke up _build_paths_and_indexes() into smaller pieces and removed all of the logic that identified data nodes with 'File'. Let me know if this is still not enough.

ptth222 avatar May 20 '24 21:05 ptth222

The only conflict is where I deleted a line in testing that was printing. Can this be resolved?

ptth222 avatar Jun 28 '24 18:06 ptth222

The only conflict is where I deleted a line in testing that was printing. Can this be resolved?

@ptth222 there is indeed an outstanding merge conflict but resolving it wouldn't allow to merge still as the PR does not solve the issue. We have not found a fix yet unfortunately

proccaserra avatar Jul 01 '24 17:07 proccaserra

What issue? I addressed everything mentioned in the history above, and all of the tests pass. If there is an issue can we create a test for it?

ptth222 avatar Jul 01 '24 22:07 ptth222

@ptth222 finally managed to find time and issue causing the problem while merging into my local branch. All tests are now passing locally but need to investigate one more case. Apologies for the delay

proccaserra avatar Jul 09 '24 10:07 proccaserra

@ptth222 closing this as your changes have been integrated in issue-511.

proccaserra avatar Jul 15 '24 11:07 proccaserra