openneuro
openneuro copied to clipboard
ds000113 nifti file content differs from original openfmri upload
We are attempting to reconvert the original data release of ds000113 (from the old openfmri days) from DICOMs in order to get maximum metadata, and, importantly more valid metadata.
In doing so we investigated with @m-wierzba how the original upload to openfmri differs from the dataset the is presently downloadable from openneuro. We focused on the file content (checksum based match), because the filenames and dataset layout have changed obviously.
It seems that most NIfTI images have been altered. We suspect that some kind of header normalization procedure was applied. It would be instrumental for us to understand what exactly was done.
Below is the diff of the output of fslhd
on one and the same file, comparing original upload to download offer.
Specifically, it seems that qform
and sform
have been equalized, an FSL tools replaced the (arguably more informative) description, and some numerical uncertainty has slightly altered the image affine.
fslhd diff
--- oldhd.txt 2021-04-27 14:24:04.254049923 +0200
+++ newhd.txt 2021-04-27 14:24:19.638214798 +0200
@@ -1,4 +1,4 @@
-filename ../anondata/sub001/BOLD/task001_run004/bold.nii.gz
+filename ds000113/sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_acq-raw_run-04_bold.nii.gz
sizeof_hdr 348
data_type INT16
@@ -18,7 +18,7 @@
pixdim0 0.000000
pixdim1 1.400000
pixdim2 1.400000
-pixdim3 1.539948
+pixdim3 1.539950
pixdim4 2.000000
pixdim5 0.000000
pixdim6 0.000000
@@ -45,23 +45,23 @@
intent_p3 0.000000
qform_name Scanner Anat
qform_code 1
-qto_xyz:1 -1.399780 0.009211 0.025353 105.693428
-qto_xyz:2 0.003949 1.366110 -0.336755 -66.909798
-qto_xyz:3 0.024505 0.306038 1.502462 -57.281765
+qto_xyz:1 -1.399780 0.009212 0.025360 105.693001
+qto_xyz:2 0.003948 1.366110 -0.336755 -66.909798
+qto_xyz:3 0.024512 0.306037 1.502464 -57.281799
qto_xyz:4 0.000000 0.000000 0.000000 1.000000
qform_xorient Right-to-Left
qform_yorient Posterior-to-Anterior
qform_zorient Inferior-to-Superior
-sform_name Unknown
-sform_code 0
-sto_xyz:1 0.000000 0.000000 0.000000 0.000000
-sto_xyz:2 0.000000 0.000000 0.000000 0.000000
-sto_xyz:3 0.000000 0.000000 0.000000 0.000000
-sto_xyz:4 0.000000 0.000000 0.000000 0.000000
-sform_xorient Unknown
-sform_yorient Unknown
-sform_zorient Unknown
+sform_name Scanner Anat
+sform_code 1
+sto_xyz:1 -1.399780 0.009212 0.025360 105.693001
+sto_xyz:2 0.003948 1.366110 -0.336755 -66.909798
+sto_xyz:3 0.024512 0.306037 1.502464 -57.281799
+sto_xyz:4 0.000000 0.000000 0.000000 1.000000
+sform_xorient Right-to-Left
+sform_yorient Posterior-to-Anterior
+sform_zorient Inferior-to-Superior
file_type NIFTI-1+
file_code 1
-descrip mi_ep2d_flashref_bold_160_iPat3_1.4mm_36sl_R4
+descrip FSL5.0
aux_file
If we understand what was done, and if we are successful in creating a modern day BIDS dataset from the original DICOMs and other raw material, we would like to propose that as an update of the present ds000113 -- as an attempt to reunite the currently disjoint histories of the datasets that we maintain.
Thx in advance!
This was before my time. I think @chrisgorgo and @suyashdb (possibly @jbwexler?) are most likely to remember what happened here. If I had to guess, it might just be fslreorient2std
was applied to all data.
I don't see anything in your old header that demands normalizing, such as leaked PHI, so would be +1 to reverting to the original files unless someone who remembers the event can provide a good argument.
Thanks for the swift reply!
re fslreorient2std
: Great, we will try that out and see if it can explain the diff.
re update: I know that you only accept fast-forward changes. So here is what the change would look like -- if all goes well.
- added subdataset: referencing our project superdatasets that binds together all studyforrest components
-
datalad-run
record, capturing adatalad-copyfile
run that updates the files from a defined state of our cannonical source superdataset
IOW two additional commits (per update, if we think long-term). This way the provenance link is precise, without forcing the repository history to become one. The linked subdataset (our superdataset) will live in some public place like github, so consumers can obtain it, and do further inspection, if desired.
Do you see issues caused by such an approach?
The main issues I see are:
- Where is the superdataset going to be placed?
sourcedata/
? Something.bidsignore
d? - I'm not sure if we smoothly handle git submodules yet. That's definitely on the roadmap if it's not yet complete. @nellh would probably know best.
Thinking about it, --ff-only
can incorporate merge commits, so I think that we can accept a merged history (again, @nellh is the authority), if you can get a clean merge and the resulting dataset validates. Force pushes are definitely disallowed, and I'm not positive about grafts but would recommend against trying anyway.
Would this be cleaner, or is the copyfile
approach cleanest?
- Where is the superdataset going to be placed?
sourcedata/
? Something.bidsignore
d?
Yes, likely.
- I'm not sure if we smoothly handle git submodules yet. That's definitely on the roadmap if it's not yet complete. @nellh would probably know best.
Will keep that in mind. Thx!
Thinking about it,
--ff-only
can incorporate merge commits, so I think that we can accept a merged history (again, @nellh is the authority), if you can get a clean merge and the resulting dataset validates. Force pushes are definitely disallowed, and I'm not positive about grafts but would recommend against trying anyway.Would this be cleaner, or is the
copyfile
approach cleanest?
The rational for not merging history is that we do not have a single local repository that contains all the pieces. We have the four individual ones from which the original four openfmri datasets were created. The present openneuro ds000113 is a later amalgamation of those, not done by us (upstream). So we aim to provide a continuation for these two different types of entities, maintain an accurate dataset on openneuro, but also stay linked to the data descriptors that we can no longer change.
I searched my slack and email history but unfortunately didn't find anything related to the alteration of nifti files of this particular dataset. I do recall there was a period when we commonly used fslreoreint2std before rerunning pydeface if the initial pydeface failed. There was also a brief period when we would use fsl to change pixdim4 if it didn't match the paper, until we decided it was better to leave the header as is. I recall that changing pixdim4 would also alter other fields slightly