data-multi-subject Deal with subjects scanned on different scanners

There are 15 sub-tokyo subjects; however, there's actually only 5 real subjects involved, each scanned on three different MRI scanners. For example:

u108545@joplin:~/data-multi-subject$ ls sub-tokyo*01
sub-tokyo750w01:
anat  dwi

sub-tokyoIngenia01:
anat  dwi

sub-tokyoSkyra01:
anat  dwi
u108545@joplin:~/data-multi-subject$ egrep 'sub-tokyo.*?01[[:space:]]*(F|M)' participants.tsv 
sub-tokyo750w01	M	25	-	-	2019-10-01	tokyo750w	the University of Tokyo	GE	MR750w	-	24_LX_MR_Software_release:DV24.0_R01_1344.a	"K. Kamiya, Y. Suzuki"
sub-tokyoIngenia01	M	25	-	-	2019-10-01	tokyo	Ingenia the University of Tokyo	Philips	Ingenia	-	5.3.1_5.3.1.1"K. Kamiya, Y. Suzuki"
sub-tokyoSkyra01	M	25	-	-	2019-10-01	tokyoSkyra	the University of Tokyo	Siemens	Skyra	HeadNeck_20	syngo_MR_E11	"K. Kamiya, Y. Suzuki"

It is safer, and probably more BIDS-compliant, if we represented the "different scanner" field using an acq- entity (or possibly ses-), and put these scans all under a single folder (sub-tokyo01). Then we only need to record their tabular data once in participants.tsv and repairs like #96 won't be so fraught to perform.

Discovered in https://github.com/spine-generic/data-multi-subject/pull/96#issuecomment-930497296

Nov 15 '21 21:11 kousu

This is a valid concern. However, merging these participants would break the analysis code, so there is a pros/cons here.

Nov 15 '21 21:11 jcohenadad

I'll fix the analysis code.

Nov 15 '21 21:11 kousu

Turns out, the hardware field already has a place to go in BIDS: it goes in the .json, not in the filename, and we have this data in the right place already:

u108545@joplin:~/data-multi-subject$ grep ManufacturersModelName sub-tokyo*01/anat/*.json 
sub-tokyo750w01/anat/sub-tokyo750w01_acq-MToff_MTS.json:	"ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyo750w01/anat/sub-tokyo750w01_acq-MTon_MTS.json:	"ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyo750w01/anat/sub-tokyo750w01_acq-T1w_MTS.json:	"ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyo750w01/anat/sub-tokyo750w01_T1w.json:	"ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyo750w01/anat/sub-tokyo750w01_T2star.json:	"ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyo750w01/anat/sub-tokyo750w01_T2w.json:	"ManufacturersModelName": "DISCOVERY_MR750w",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_acq-MToff_MTS.json:	"ManufacturersModelName": "Ingenia_CX",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_acq-MTon_MTS.json:	"ManufacturersModelName": "Ingenia_CX",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_acq-T1w_MTS.json:	"ManufacturersModelName": "Ingenia_CX",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_T1w.json:	"ManufacturersModelName": "Ingenia_CX",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_T2star.json:	"ManufacturersModelName": "Ingenia_CX",
sub-tokyoIngenia01/anat/sub-tokyoIngenia01_T2w.json:	"ManufacturersModelName": "Ingenia_CX",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_acq-MToff_MTS.json:	"ManufacturersModelName": "Skyra",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_acq-MTon_MTS.json:	"ManufacturersModelName": "Skyra",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_acq-T1w_MTS.json:	"ManufacturersModelName": "Skyra",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_T1w.json:	"ManufacturersModelName": "Skyra",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_T2star.json:	"ManufacturersModelName": "Skyra",
sub-tokyoSkyra01/anat/sub-tokyoSkyra01_T2w.json:	"ManufacturersModelName": "Skyra",

And BIDS recommends encoding multiple visits/scans by nesting them a level deeper under ses-<label>/.

I propose

either not encoding the scanner in the filename at all but adding a session field, or encoding it in the 'session' field: sub-tokyo{scanner}{id} -> sub-tokyo{id}_ses-{scanner}

So, either:
```
u108545@joplin:~/data-multi-subject$ mkdir -p sub-tokyo05 && git mv sub-tokyoIngenia05/ sub-tokyo05/ses-01
```
or
```
u108545@joplin:~/data-multi-subject$ mkdir -p sub-tokyo05 && git mv sub-tokyoIngenia05/ sub-tokyo05/ses-Ingenia
```
but repeated for each every subject. For most subjects with only one session, BIDS still wants us to nest a ses-01/ folder:

The extra session layer (at least one /ses- subfolder) SHOULD be added for all subjects

Merging the tokyo subjects:

either

git mv sub-tokyoSkyra{id} sub-tokyo{id}/ses-02
git mv sub-tokyo750w{id} sub-tokyo{id}/ses-03

or

git mv sub-tokyoSkyra{id} sub-tokyo{id}/ses-Skyra
git mv sub-tokyo750w{id} sub-tokyo{id}/ses-750w

Move the date, manufacturer, manufacturers_model_name from participants.tsv to per-subject sub-tokyo{id}/sub-tokyo{id}_sessions.tsv files
Changing the analysis code to parse out the information when it needs it from either the .jsons, or the _session.tsv files, not the filenames.

Nov 15 '21 23:11 kousu

thank you @kousu, this seems like a very reasonable plan. In terms of index vs. scanner name in the filename, i do have a slight preference for encoding in the file name, just because it is more human friendly

Nov 16 '21 02:11 jcohenadad

thank you @kousu, this seems like a very reasonable plan. In terms of index vs. scanner name in the filename, i do have a slight preference for encoding in the file name, just because it is more human friendly

Great. I can do that!

Nov 16 '21 15:11 kousu

Reviving this thread, given a recent comment https://github.com/spine-generic/data-multi-subject/issues/166 and the demographic-based project from @renelabounek. We should find a reasonable strategy to deal with the same subjects being scanned at multiple sites. The solutions proposed in https://github.com/spine-generic/data-multi-subject/issues/102#issuecomment-969444920 is problematic, in that the logic of the analysis code and results should be drastically different. I'm wondering if simply adding a column in the https://github.com/spine-generic/data-multi-subject/blob/113b258695074b77d40ba987474eddc14f9d9698/participants.tsv with an arbitrary ID for each subject could properly address this? Then, for projects where the demographics of the subject is relevant (eg: @renelabounek project), the specific analysis code could use that information (by, eg., selecting non-duplicate subjects based on their IDs as opposed to based on the participant_id).

May 12 '24 15:05 jcohenadad

data-multi-subject data-multi-subject copied to clipboard

Deal with subjects scanned on different scanners

data-multi-subject
data-multi-subject copied to clipboard