mne-python icon indicating copy to clipboard operation
mne-python copied to clipboard

How to handle datasets with invalid info[meas_id][secs]?

Open hoechenberger opened this issue 5 years ago • 11 comments

I'm woking with the ds000246 OpenNeuro dataset:

$ aws s3 sync --no-sign-request s3://openneuro.org/ds000246 ds000246
$ cd ds000246/sub-emptyroom/meg

Reading the data works as expected:

import mne
raw = mne.io.read_raw_ctf('sub-emptyroom_task-noise_run-01_meg.ds')

Writing thows an exception:

raw.save('/tmp/foo.fif')

Traceback:

RuntimeError                              Traceback (most recent call last)
<ipython-input-4-eb369e79ee42> in <module>
----> 1 raw.save('/tmp/foo.fif')

<decorator-gen-155> in save(self, fname, picks, tmin, tmax, buffer_size_sec, drop_small_buffer, proj, fmt, overwrite, split_size, split_naming, verbose)

~/Development/mne-python/mne/io/base.py in save(self, fname, picks, tmin, tmax, buffer_size_sec, drop_small_buffer, proj, fmt, overwrite, split_size, split_naming, verbose)
   1379                 "split_naming must be either 'neuromag' or 'bids' instead "
   1380                 "of '{}'.".format(split_naming))
-> 1381         _write_raw(fname, self, info, picks, fmt, data_type, reset_range,
   1382                    start, stop, buffer_size, projector, drop_small_buffer,
   1383                    split_size, split_naming, part_idx, None, overwrite)

~/Development/mne-python/mne/io/base.py in _write_raw(fname, raw, info, picks, fmt, data_type, reset_range, start, stop, buffer_size, projector, drop_small_buffer, split_size, split_naming, part_idx, prev_fname, overwrite)
   1844 
   1845     picks = _picks_to_idx(info, picks, 'all', ())
-> 1846     fid, cals = _start_writing_raw(use_fname, info, picks, data_type,
   1847                                    reset_range, raw.annotations)
   1848 

~/Development/mne-python/mne/io/base.py in _start_writing_raw(name, info, sel, data_type, reset_range, annotations)
   2018         cals.append(info['chs'][k]['cal'] * info['chs'][k]['range'])
   2019 
-> 2020     write_meas_info(fid, info, data_type=data_type, reset_range=reset_range)
   2021 
   2022     #

~/Development/mne-python/mne/io/meas_info.py in write_meas_info(fid, info, data_type, reset_range)
   1453     """
   1454     info._check_consistency()
-> 1455     _check_dates(info)
   1456 
   1457     # Measurement info

~/Development/mne-python/mne/io/meas_info.py in _check_dates(info, prepend_error)
   1411                 if (value[key_2] < np.iinfo('>i4').min or
   1412                         value[key_2] > np.iinfo('>i4').max):
-> 1413                     raise RuntimeError('%sinfo[%s][%s] must be between '
   1414                                        '"%r" and "%r", got "%r"'
   1415                                        % (prepend_error, key, key_2,

RuntimeError: info[meas_id][secs] must be between "-2147483648" and "2147483647", got "-5364633480"

How to best deal with data like this? Can I simply set info[meas_id][secs] to an arbitrary (valid) value? Also it seems a little odd that I can create (and work with) some data by reading it, but then cannot write it back to disk…

hoechenberger avatar May 20 '20 20:05 hoechenberger

Also it seems a little odd that I can create (and work with) some data by reading it, but then cannot write it back to disk…

The FIF format in particular has a limit on how large a span of dates it can write because it writes out seconds in int32. Other formats that use other methods (e.g., storing seconds in int64, or dates in a suitable string format) will not suffer from this problem.

As to how to fix it, you can set it to zero and things will work (unless you have saved separate annotations you want to add), but be careful if you ever want to do something having to do with dates across multiple subjects or runs. Typically during anonymization you shift all subjects and runs by some fixed amount so that their relative timings stay fixed. Wiping out the meas_date will make this no longer be the case.

larsoner avatar May 21 '20 18:05 larsoner

for the record here the file comes from a non-bids valid dataset as we made sure dates for bids MEG are compatible with fif.

agramfort avatar May 21 '20 21:05 agramfort

I would check what the date is. 5364633480 is about 170 years so my guess is that this data has been anonymized using some method that makes that value not meaningful.

If you want to be extra cautious, preserving as much information as you can in case it is relevant, you could use raw.anonymize() - which should time shift everything so that meas_date in range while preserving the timedelta between meas_date the other dates in the file.

https://mne.tools/stable/generated/mne.io.Raw.html#mne.io.Raw.anonymize

On Thu, May 21, 2020 at 5:13 PM Alexandre Gramfort [email protected] wrote:

for the record here the file comes from a non-bids valid dataset as we made sure dates for bids MEG are compatible with fif.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mne-tools/mne-python/issues/7803#issuecomment-632349173, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKTXHMQZLBLY2S4XWDPKBLRSWKQJANCNFSM4NGJOCFQ .

bloyl avatar May 21 '20 21:05 bloyl

It does pass validation with the BIDS validator though. We should probably file a bug report.

-- Sent from my phone, please excuse brevity and erroneous auto-correct.

On 21. May 2020, at 23:13, Alexandre Gramfort [email protected] wrote:

 for the record here the file comes from a non-bids valid dataset as we made sure dates for bids MEG are compatible with fif.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

hoechenberger avatar May 21 '20 21:05 hoechenberger

bids validator cannot read meg files just the file names so he cannot detect these issues.

agramfort avatar May 22 '20 07:05 agramfort

bids validator cannot read meg files just the file names so he cannot detect these issues.

Wait, so you're saying there's BIDS-relevant metadata stored in a file format that the BIDS validator cannot read? Shouldn't this be stored in a sidecar file, like the events??

hoechenberger avatar May 22 '20 07:05 hoechenberger

Thanks @larsoner for the explanation, and thanks @bloyl for the suggestion to try and re-anonymize, I will look into this and see how it goes!

hoechenberger avatar May 22 '20 08:05 hoechenberger

This raises an interesting question.

What is the expectation if bids sidecar information differs from what is stored in the underlying imaging data headers?

bloyl avatar May 24 '20 19:05 bloyl

What is the expectation if bids sidecar information differs from what is stored in the underlying imaging data headers?

I believe the sidecar-based values always take precedence.

hoechenberger avatar May 24 '20 20:05 hoechenberger

I believe the sidecar-based values always take precedence.

+1

agramfort avatar May 25 '20 08:05 agramfort

Same issue here with the Temple University TUAR dataset. Ended up just dropping the meas_date.

davidcian avatar Dec 09 '23 14:12 davidcian