bids-tools icon indicating copy to clipboard operation
bids-tools copied to clipboard

CTF's <blabla>.infods file contains information about the date and time of data acquisition

Open schoffelen opened this issue 3 years ago • 5 comments

I came across this while removing the date and timestamps from the *.res4 files of data that Kristijan and I are going to share with the rest of the world. I was surprised that ft_read_header still managed to contain a date that referred to the date of acquisition. hdr.orig.res4 is nicely stripped of course after running ctf_remove_datetime, but hdr.orig.infods still has this info available.

To reproduce:

cd /project/3011020.13/bids/sub-V1020/meg/sub-V1020_task-visual_meg.ds x=ft_read_header('sub-V1020_task-visual_meg.ds'); x.orig.infods(41)

alternatively, you can also read the 'sub-V1020_task-visual_meg.infods text file...

I would assume that the .infods files are usually not removed from the data before sharing, so I would be game to write some voodoo shell script that can butcher the .infods file.

schoffelen avatar Jun 22 '21 19:06 schoffelen

should we continue with these scripts, or use https://www.fieldtriptoolbox.org/faq/how_can_i_anonymize_a_ctf_dataset/#using-matlab

I recently used the latter for a Parkinson MEG dataset.

robertoostenveld avatar Jun 23 '21 08:06 robertoostenveld

It is this dataset https://data.donders.ru.nl/collections/di/dccn/DSC_3018009.04_857. I checked one infods and that looked like

WS1_   
_PATIENT_INFO   WS1_   _PATIENT_UID   
       _PATIENT_NAME_FIRST   
       _PATIENT_NAME_MIDDLE   
       _PATIENT_NAME_LAST   
       _PATIENT_ID   
   NOT FOR CLINICAL USE   _PATIENT_BIRTHDATE   
       _PATIENT_SEX         _PATIENT_PACS_NAME   
       _PATIENT_PACS_UID   
       _PATIENT_INSTITUTE   
   NOT FOR CLINICAL USE   EndOfParameters   _PROCEDURE_INFO   WS1_   _PROCEDURE_VERSION         _PROCEDURE_UID   
       _PROCEDURE_ACCESSIONNUMBER   
       _PROCEDURE_TITLE   
       _PROCEDURE_SITE   
       _PROCEDURE_STATUS         _PROCEDURE_TYPE         _PROCEDURE_STARTEDDATETIME   
       _PROCEDURE_CLOSEDDATETIME   
       _PROCEDURE_COMMENTS   
    writeCTFds  NOT FOR CLINICAL USE   _PROCEDURE_LOCATION   
       _PROCEDURE_ISINDB         EndOfParameters   
_DATASET_INFO   WS1_   _DATASET_VERSION         _DATASET_UID   
       _DATASET_PATIENTUID   
       _DATASET_PROCEDUREUID   
       _DATASET_STATUS   
    writeCTFds  NOT FOR CLINICAL USE   _DATASET_RPFILE   
       _DATASET_PROCSTEPTITLE   
   run title  NOT FOR CLINICAL USE   _DATASET_PROCSTEPPROTOCOL   
       _DATASET_PROCSTEPDESCRIPTION   
       _DATASET_COLLECTIONDATETIME   
       _DATASET_COLLECTIONSOFTWARE   
   
writeCTFds   _DATASET_CREATORDATETIME   
   20210409140701   _DATASET_CREATORSOFTWARE   
   
writeCTFds   _DATASET_KEYWORDS   
       _DATASET_COMMENTS   
   NOT FOR CLINICAL USE   _DATASET_OPERATORNAME   
       _DATASET_LASTMODIFIEDDATETIME   
   20210409140701   _DATASET_NOMINALHCPOSITIONS          _DATASET_COEFSFILENAME   
       _DATASET_SENSORSFILENAME   
       _DATASET_SYSTEM   
       _DATASET_SYSTEMTYPE   
       _DATASET_LOWERBANDWIDTH              _DATASET_UPPERBANDWIDTH   @r¿

robertoostenveld avatar Jun 23 '21 08:06 robertoostenveld

I think that using the referenced strategy is better. Yet, for my current use case it feels a bit as an overkill, because it requires a full copy of the data to be created (i.e. no in place update of the descriptors seems possible, unless the code is hacked).

Also, from what I read in writeCTFds.m it seems as if the hz.ds/hz2.ds are not included in the output. (although I am not sure whether this would be a problem).

schoffelen avatar Jun 29 '21 13:06 schoffelen

Also, it seems that the *.acq files also may contain run_date and run_time.

writeCTFds uses writeCPersist to write the acq and infods files. This is a separate function (i.e. no subfunction from writeCTFds), so it should be possible to overwrite these metadata files, without the need of rewriting the whole data directory.

schoffelen avatar Jun 29 '21 13:06 schoffelen

OK, I have written a prototype function (inspired by the function that @robertoostenveld referred to above) that rewrites the files in the *.ds dir that contain dates and times, i.e. the res4, acq and infods. This without the need of creating a full copy of the binary data as well. Currently, my prototype function moves the originals into *.res4_old etc, but if we are sure that it works fine, I think that the originals can be overwritten. Would it be an idea to use this code to refresh the referenced website, and or consider to make this part of the standard bidsification procedure in data2bids?

@KristijanArmeni I will soon do a full sweep of the Sherlock data to scrub it from date and time (and operator :) )

schoffelen avatar Jun 29 '21 14:06 schoffelen