data-multi-subject
data-multi-subject copied to clipboard
Deal with subjects scanned on different scanners
There are 15 sub-tokyo subjects; however, there's actually only 5 real subjects involved, each scanned on three different MRI scanners. For example:
u108545@joplin:~/data-multi-subject$ ls sub-tokyo*01
sub-tokyo750w01:
anat dwi
sub-tokyoIngenia01:
anat dwi
sub-tokyoSkyra01:
anat dwi
u108545@joplin:~/data-multi-subject$ egrep 'sub-tokyo.*?01[[:space:]]*(F|M)' participants.tsv
sub-tokyo750w01 M 25 - - 2019-10-01 tokyo750w the University of Tokyo GE MR750w - 24_LX_MR_Software_release:DV24.0_R01_1344.a "K. Kamiya, Y. Suzuki"
sub-tokyoIngenia01 M 25 - - 2019-10-01 tokyo Ingenia the University of Tokyo Philips Ingenia - 5.3.1_5.3.1.1"K. Kamiya, Y. Suzuki"
sub-tokyoSkyra01 M 25 - - 2019-10-01 tokyoSkyra the University of Tokyo Siemens Skyra HeadNeck_20 syngo_MR_E11 "K. Kamiya, Y. Suzuki"
It is safer, and probably more BIDS-compliant, if we represented the "different scanner" field using an acq- entity (or possibly ses-), and put these scans all under a single folder (sub-tokyo01). Then we only need to record their tabular data once in participants.tsv and repairs like #96 won't be so fraught to perform.
Discovered in https://github.com/spine-generic/data-multi-subject/pull/96#issuecomment-930497296
This is a valid concern. However, merging these participants would break the analysis code, so there is a pros/cons here.
I'll fix the analysis code.
Turns out, the hardware field already has a place to go in BIDS: it goes in the .json, not in the filename, and we have this data in the right place already:
u108545@joplin:~/data-multi-subject$ grep ManufacturersModelName sub-tokyo*01/anat/*.json
sub-tokyo750w01/anat/sub-tokyo750w01_acq-MToff_MTS.json: "ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyo750w01/anat/sub-tokyo750w01_acq-MTon_MTS.json: "ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyo750w01/anat/sub-tokyo750w01_acq-T1w_MTS.json: "ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyo750w01/anat/sub-tokyo750w01_T1w.json: "ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyo750w01/anat/sub-tokyo750w01_T2star.json: "ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyo750w01/anat/sub-tokyo750w01_T2w.json: "ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_acq-MToff_MTS.json: "ManufacturersModelName": "Ingenia_CX",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_acq-MTon_MTS.json: "ManufacturersModelName": "Ingenia_CX",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_acq-T1w_MTS.json: "ManufacturersModelName": "Ingenia_CX",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_T1w.json: "ManufacturersModelName": "Ingenia_CX",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_T2star.json: "ManufacturersModelName": "Ingenia_CX",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_T2w.json: "ManufacturersModelName": "Ingenia_CX",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_acq-MToff_MTS.json: "ManufacturersModelName": "Skyra",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_acq-MTon_MTS.json: "ManufacturersModelName": "Skyra",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_acq-T1w_MTS.json: "ManufacturersModelName": "Skyra",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_T1w.json: "ManufacturersModelName": "Skyra",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_T2star.json: "ManufacturersModelName": "Skyra",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_T2w.json: "ManufacturersModelName": "Skyra",
And BIDS recommends encoding multiple visits/scans by nesting them a level deeper under ses-<label>/.
I propose
-
either not encoding the scanner in the filename at all but adding a session field, or encoding it in the 'session' field:
sub-tokyo{scanner}{id}->sub-tokyo{id}_ses-{scanner}So, either:
u108545@joplin:~/data-multi-subject$ mkdir -p sub-tokyo05 && git mv sub-tokyoIngenia05/ sub-tokyo05/ses-01or
u108545@joplin:~/data-multi-subject$ mkdir -p sub-tokyo05 && git mv sub-tokyoIngenia05/ sub-tokyo05/ses-Ingeniabut repeated for each every subject. For most subjects with only one session, BIDS still wants us to nest a
ses-01/folder:The extra session layer (at least one /ses-
-
Merging the tokyo subjects:
either
git mv sub-tokyoSkyra{id} sub-tokyo{id}/ses-02 git mv sub-tokyo750w{id} sub-tokyo{id}/ses-03or
git mv sub-tokyoSkyra{id} sub-tokyo{id}/ses-Skyra git mv sub-tokyo750w{id} sub-tokyo{id}/ses-750w -
Move the
date,manufacturer,manufacturers_model_namefromparticipants.tsvto per-subjectsub-tokyo{id}/sub-tokyo{id}_sessions.tsvfiles -
Changing the analysis code to parse out the information when it needs it from either the
.jsons, or the_session.tsvfiles, not the filenames.
thank you @kousu, this seems like a very reasonable plan. In terms of index vs. scanner name in the filename, i do have a slight preference for encoding in the file name, just because it is more human friendly
thank you @kousu, this seems like a very reasonable plan. In terms of index vs. scanner name in the filename, i do have a slight preference for encoding in the file name, just because it is more human friendly
Great. I can do that!
Reviving this thread, given a recent comment https://github.com/spine-generic/data-multi-subject/issues/166 and the demographic-based project from @renelabounek. We should find a reasonable strategy to deal with the same subjects being scanned at multiple sites. The solutions proposed in https://github.com/spine-generic/data-multi-subject/issues/102#issuecomment-969444920 is problematic, in that the logic of the analysis code and results should be drastically different. I'm wondering if simply adding a column in the https://github.com/spine-generic/data-multi-subject/blob/113b258695074b77d40ba987474eddc14f9d9698/participants.tsv with an arbitrary ID for each subject could properly address this? Then, for projects where the demographics of the subject is relevant (eg: @renelabounek project), the specific analysis code could use that information (by, eg., selecting non-duplicate subjects based on their IDs as opposed to based on the participant_id).