spinalcordtoolbox
spinalcordtoolbox copied to clipboard
Output JSON sidecar files after processing a la BIDS
In the context processing pipeline and databases, in order to keep track of the software and version used to generate derivatives, it would be useful if SCT would output a JSON sidecar (following BIDS convention). E.g.:
sct_deepseg_sc -i -FILE.nii -c t2
Would output:
FILE_seg.nii
FILE_seg.json
And the content of the JSON could be the following (inspired by this link):
{
"Name": "FMRIPREP Outputs",
"BIDSVersion": "1.4.0",
"DatasetType": "derivative",
"GeneratedBy": [
{
"Name": "sct",
"Version": "5.3.0",
},
{
"Author": "John Doe",
"Description": "Manual correction of spinal cord segmentation",
"Date": "2021-05-10 10:10:28",
}
],
"SourceDatasets": [
{
"DOI": "XXX",
"URL": "data.neuro.polymtl.ca:datasets/uk-biobank-processed",
"Version": "1.0.1"
}
]
}
Another useful link: https://hackmd.io/@effigies/bids-derivatives-readme
We came across this point again with @tzebre and @jcohenadad when discussing if segmentations produced by the nnU-Net framework should be accompanied by JSON sidecar (see discussion here).
Since this point is periodically repeated from time to time (https://github.com/spinalcordtoolbox/spinalcordtoolbox/issues/2807, https://github.com/spinalcordtoolbox/spinalcordtoolbox/issues/4023), I am increasing its priority.
At first I thought that this might be possible to do using our compat/launcher.py
file, since this file contains a set of steps that are run for every available script. I was envisioning some very generic code -- a barebones json
file containing the SCT command, SCT version, and date, which launcher.py
would definitely have access to. This would let us avoid repeating ourselves in every script.
But, there are two main issues with this:
- The output folder varies from script to script, and also varies at runtime via the
-o
/-ofolder
options. (Related: #4070). This is an issue becauselauncher.py
can't access the resulting output folder from outside the script. - We have SCT CLI scripts that we don't want to generate sidecar files for, namely ones that don't generate an output image (e.g.
sct_version
,sct_compute_hausdorff_distance
, etc.).
In theory, we could fix both of these problems by propagating the output files/output directory back to launcher.py
after the script completes. (And, for scripts that we don't want to generate a sidecar for, maybe we could return None
?)
Aside: Specifying return values for
main
functions would have added benefits for our test suite, since we could be explicit with which files are generated by each script, and use that to more easily validate (and clean up) output files.
However, fetching a return value would require us to change how launcher.py
is written (since currently we're using a subprocess):
https://github.com/spinalcordtoolbox/spinalcordtoolbox/blob/e740edf4c8408ffa44ef7ba23ad068c6d07e4b87/spinalcordtoolbox/compat/launcher.py#L47
But, maybe this would be a welcome change? Does using a subprocess incur a performance hit right now? ((Changing the subprocess
behavior might also let us move the init_sct()
function call inside the wrapper, too.))
Otherwise, if these changes are too broad in scope, then we could just as easily make a simple function and call it at the end of each script, too. I guess it just depends on how badly we want a within-process wrapper for executing code before and after each CLI script? :thinking:
My first thoughts are that:
- Indeed it's not every script that should generate a sidecar. (And possibly some scripts produce multiple images, in which case there should be a sidecar for each?) So, the mechanism has to be conditional.
- Yes, the launcher is missing some of the information (most importantly, output location), so maybe it shouldn't have the responsibility for generating the file. But, maybe it's a good place to generate some of the contents of the sidecar?
- On the other hand, I think the main upside for generating the sidecar in the launcher is that it's such a strong nudge for script writers.
- Some other ideas to explore:
- Is this sidecar functionality similar to the QC report functionality?
- Or maybe the code for generating a sidecar should live close to the image saving code?
On the other hand, I think the main upside for generating the sidecar in the launcher is that it's such a strong nudge for script writers.
@mguaypaq I'm not sure I understand that argument. Are you referring to users who write SHELL scripts that consist of a sequence of SCT commands to process their data (eg: batch_processing.sh)? If so, I don't see how generating the sidecar in the launcher (vs. in individual SCT's CLI scripts) will encourage people to use scripts (because for them, it will be opaque where the sidecars are actually generated).
Overall, this discussion is related to the long "-o/-ofolder" debate. I am wondering if we could think of a -obids flag, that would output things in the derivatives according to BIDS. And if it is on, it would generate the sidecar. Then, we could implement the -obids flag in the appropriate CLIs, and so we would not worry about the launcher generating sidecars for scripts like sct_version
(on which -obids would not be implemented).
@mguaypaq I'm not sure I understand that argument. Are you referring to users who write SHELL scripts that consist of a sequence of SCT commands to process their data (eg: batch_processing.sh)?
Sorry for being unclear, I meant writers of new sct_*
commands (most likely, students or postdocs at NeuroPoly working on SCT itself). So, if the "template" for new sct_*
commands asked for sidecar information by default (for example, returned by main()
as @joshuacwnewton suggested), whoever programs it would necessarily have to consider sidecars.
Small update: I'm going to start experimenting with a launcher.py
prototype today. :)
In our 2024-02-21 SCT Meeting, we discussed JSON sidecars and how they should be implemented.
We came to the following conclusions:
- The main motivation for sidecar files comes from the
manual-correction
pipeline. Meaning, sidecar generation is most useful for tracking the provenance of segmentation files that may be manually corrected in the future. (More context in this comment.) - So, this feature doesn't necessarily need to be implemented for every SCT script that generates output. Instead, we can start with
sct_propseg
/sct_deepseg_sc
/sct_deepseg
, then expand the functionality if need be.
This should limit the scope of this feature and make it much more straightforward to implement, as we only need to ensure compatibility with manual-correction
for now. :)