How to handle datasets with invalid info[meas_id][secs]?
I'm woking with the ds000246 OpenNeuro dataset:
$ aws s3 sync --no-sign-request s3://openneuro.org/ds000246 ds000246
$ cd ds000246/sub-emptyroom/meg
Reading the data works as expected:
import mne
raw = mne.io.read_raw_ctf('sub-emptyroom_task-noise_run-01_meg.ds')
Writing thows an exception:
raw.save('/tmp/foo.fif')
Traceback:
RuntimeError Traceback (most recent call last)
<ipython-input-4-eb369e79ee42> in <module>
----> 1 raw.save('/tmp/foo.fif')
<decorator-gen-155> in save(self, fname, picks, tmin, tmax, buffer_size_sec, drop_small_buffer, proj, fmt, overwrite, split_size, split_naming, verbose)
~/Development/mne-python/mne/io/base.py in save(self, fname, picks, tmin, tmax, buffer_size_sec, drop_small_buffer, proj, fmt, overwrite, split_size, split_naming, verbose)
1379 "split_naming must be either 'neuromag' or 'bids' instead "
1380 "of '{}'.".format(split_naming))
-> 1381 _write_raw(fname, self, info, picks, fmt, data_type, reset_range,
1382 start, stop, buffer_size, projector, drop_small_buffer,
1383 split_size, split_naming, part_idx, None, overwrite)
~/Development/mne-python/mne/io/base.py in _write_raw(fname, raw, info, picks, fmt, data_type, reset_range, start, stop, buffer_size, projector, drop_small_buffer, split_size, split_naming, part_idx, prev_fname, overwrite)
1844
1845 picks = _picks_to_idx(info, picks, 'all', ())
-> 1846 fid, cals = _start_writing_raw(use_fname, info, picks, data_type,
1847 reset_range, raw.annotations)
1848
~/Development/mne-python/mne/io/base.py in _start_writing_raw(name, info, sel, data_type, reset_range, annotations)
2018 cals.append(info['chs'][k]['cal'] * info['chs'][k]['range'])
2019
-> 2020 write_meas_info(fid, info, data_type=data_type, reset_range=reset_range)
2021
2022 #
~/Development/mne-python/mne/io/meas_info.py in write_meas_info(fid, info, data_type, reset_range)
1453 """
1454 info._check_consistency()
-> 1455 _check_dates(info)
1456
1457 # Measurement info
~/Development/mne-python/mne/io/meas_info.py in _check_dates(info, prepend_error)
1411 if (value[key_2] < np.iinfo('>i4').min or
1412 value[key_2] > np.iinfo('>i4').max):
-> 1413 raise RuntimeError('%sinfo[%s][%s] must be between '
1414 '"%r" and "%r", got "%r"'
1415 % (prepend_error, key, key_2,
RuntimeError: info[meas_id][secs] must be between "-2147483648" and "2147483647", got "-5364633480"
How to best deal with data like this? Can I simply set info[meas_id][secs] to an arbitrary (valid) value? Also it seems a little odd that I can create (and work with) some data by reading it, but then cannot write it back to disk…
Also it seems a little odd that I can create (and work with) some data by reading it, but then cannot write it back to disk…
The FIF format in particular has a limit on how large a span of dates it can write because it writes out seconds in int32. Other formats that use other methods (e.g., storing seconds in int64, or dates in a suitable string format) will not suffer from this problem.
As to how to fix it, you can set it to zero and things will work (unless you have saved separate annotations you want to add), but be careful if you ever want to do something having to do with dates across multiple subjects or runs. Typically during anonymization you shift all subjects and runs by some fixed amount so that their relative timings stay fixed. Wiping out the meas_date will make this no longer be the case.
for the record here the file comes from a non-bids valid dataset as we made sure dates for bids MEG are compatible with fif.
I would check what the date is. 5364633480 is about 170 years so my guess is that this data has been anonymized using some method that makes that value not meaningful.
If you want to be extra cautious, preserving as much information as you can
in case it is relevant, you could use raw.anonymize() - which should
time shift everything so that meas_date in range while preserving the
timedelta between meas_date the other dates in the file.
https://mne.tools/stable/generated/mne.io.Raw.html#mne.io.Raw.anonymize
On Thu, May 21, 2020 at 5:13 PM Alexandre Gramfort [email protected] wrote:
for the record here the file comes from a non-bids valid dataset as we made sure dates for bids MEG are compatible with fif.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mne-tools/mne-python/issues/7803#issuecomment-632349173, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKTXHMQZLBLY2S4XWDPKBLRSWKQJANCNFSM4NGJOCFQ .
It does pass validation with the BIDS validator though. We should probably file a bug report.
-- Sent from my phone, please excuse brevity and erroneous auto-correct.
On 21. May 2020, at 23:13, Alexandre Gramfort [email protected] wrote:
for the record here the file comes from a non-bids valid dataset as we made sure dates for bids MEG are compatible with fif.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
bids validator cannot read meg files just the file names so he cannot detect these issues.
bids validator cannot read meg files just the file names so he cannot detect these issues.
Wait, so you're saying there's BIDS-relevant metadata stored in a file format that the BIDS validator cannot read? Shouldn't this be stored in a sidecar file, like the events??
Thanks @larsoner for the explanation, and thanks @bloyl for the suggestion to try and re-anonymize, I will look into this and see how it goes!
This raises an interesting question.
What is the expectation if bids sidecar information differs from what is stored in the underlying imaging data headers?
What is the expectation if bids sidecar information differs from what is stored in the underlying imaging data headers?
I believe the sidecar-based values always take precedence.
I believe the sidecar-based values always take precedence.
+1
Same issue here with the Temple University TUAR dataset. Ended up just dropping the meas_date.