mne-python icon indicating copy to clipboard operation
mne-python copied to clipboard

read_raw_edf silently loads EDF+D files without accounting for time gaps/acquisition skips between records

Open Riley16 opened this issue 3 months ago • 3 comments

Description of the problem

MNE (v1.10.1) appears to load EDF+D files (with actually discontinuous recordings) without accounting for gaps or acquisition skips between records. For context, EDF+D files are supported by the official EDF+ format and allow for storing records that are non-contiguous in a single EDF+ file. I'm able to load EDF+D files with gaps in them and the resulting RawEDF.times show a continuous array of sample times, with no reference to the gap I intentionally inserted in my test EDF+D file. I generated this test EEG with lunapi in a simple repo I'm writing for robustly de-identifying EDFs. I've attached the discontinuous EDF+D file (renamed to .txt extension to get around extension restrictions).

basic_EDF_D.txt

The issue of discontinuous EDFs is referenced in the forums with no solution or recognition of the fact that EDF+D will load without accounting for gaps.

Supporting EDF+D would require more work, and most packages (pyedflib, EDFBrowser, edfio) do not support this format (the only package I've found that does is luna/lunapi, which is somewhat specialized to sleep research). However, MNE silently loading an EDF+D file as if it had no gaps when in fact it does is a potentially serious error.

MNE should be updated in the short term to error out upon loading EDF+D files (or at least upon loading EDF+D files that have gaps; some EDF+D files, such as those produced by Nihon Kohden clinical recording systems, are in fact continuous / have no gaps, and could be marked as EDF+C files with no loss of functionality as far as I or lunapi can tell; for MNE to support this subset of EDF+D files would require checking the Time-stamped Annotations Lists (TALs) in an 'EDF Annotations' signal).

EDF+ files are denoted as EDF+C or EDF+D by the "reserved" header field saying "EDF+C" or "EDF+D", so this is an easy fix.

Steps to reproduce

# I doubt there is any data in the MNE-Python example datasets for this issue
# to generate the discontinuous EDF+D I created (and attached with the extension renamed to .txt):
git clone [email protected]:Riley16/clean_eeg.git
pip install -e clean_eeg/
python tests/generate_edf.py

# to read it with MNE in that repo:
python src/clean_eeg/load_eeg.py --path tests/test_data/basic_EDF_D.edf --load-method mne --verbosity 3

# In short, this EDF file is 5 seconds long and had the 3rd second dropped from the record to create an acquisition skip / gap for testing with EDF+D files.

# Modify src/clean_eeg/load_eeg.py:print_edf_mne() for additional debugging outputs.

Link to data

No response

Expected results

MNE should error out upon loading the unsupported EDF+D variant of the EDF+ standard rather than ignoring acquisition skips / gaps in non-contiguous records.

Actual results

MNE does not error out, and RawEDF.times shows times that ignore the gaps (continuously count up from t = 0 to t = 4 seconds) AND the last annotation, which occurs at time 4.5 seconds (so past the duration of the EDF file that MNE sees) is dropped, even though this annotation actually occurs in the last half second of the recorded data.

Additional information

Platform macOS-10.16-x86_64-i386-64bit Python 3.11.9 (main, Apr 19 2024, 11:44:45) [Clang 14.0.6 ] Executable /Users/rdehaan/anaconda3/envs/clean_eeg2/bin/python CPU Intel(R) Xeon(R) W-3223 CPU @ 3.50GHz machdep.cpu.family: 6 machdep.cpu.model: 85 machdep.cpu.extmodel: 5 machdep.cpu.extfamily: 0 machdep.cpu.stepping: 7 machdep.cpu.feature_bits: 9223086162756500479 machdep.cpu.leaf7_feature_bits: 3550476267 2056 machdep.cpu.leaf7_feature_bits_edx: 3154117632 machdep.cpu.extfeature_bits: 1241984796928 machdep.cpu.signature: 329303 machdep.cpu.brand: 0 machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 FDPEO SMEP BMI2 ERMS INVPCID PQM FPU_CSDS MPX PQE AVX512F AVX512DQ RDSEED ADX SMAP CLFSOPT CLWB IPT AVX512CD AVX512BW AVX512VL PKU AVX512VNNI MDCLEAR IBRS STIBP L1DF ACAPMSR SSBD machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI machdep.cpu.logical_per_package: 32 machdep.cpu.cores_per_package: 16 machdep.cpu.microcode_version: 83900673 machdep.cpu.processor_flag: 1 machdep.cpu.core_count: 8 machdep.cpu.thread_count: 16 (16 cores) Memory 96.0 GiB

Core ├☑ mne 1.10.1 (latest release) ├☑ numpy 2.3.2 (unknown linalg bindings (threadpoolctl module not found: No module named 'threadpoolctl')) ├☑ scipy 1.16.1 └☑ matplotlib 3.10.5 (backend=macosx)

Numerical (optional) ├☑ pandas 2.3.1 └☐ unavailable sklearn, numba, nibabel, nilearn, dipy, openmeeg, cupy, h5io, h5py

Visualization (optional) ├☑ ipywidgets 8.1.7 └☐ unavailable pyvista, pyvistaqt, vtk, qtpy, ipympl, pyqtgraph, mne-qt-browser, trame_client, trame_server, trame_vtk, trame_vuetify

Ecosystem (optional) ├☑ edfio 0.4.9 └☐ unavailable mne-bids, mne-nirs, mne-features, mne-connectivity, mne-icalabel, mne-bids-pipeline, neo, eeglabio, mffpy, pybv

Riley16 avatar Sep 25 '25 19:09 Riley16

Hello! 👋 Thanks for opening your first issue here! ❤️ We will try to get back to you soon. 🚴

welcome[bot] avatar Sep 25 '25 19:09 welcome[bot]

MNE should error out upon loading the unsupported EDF+D variant of the EDF+ standard rather than ignoring acquisition skips / gaps in non-contiguous records.

For other File formats, when there are acquisition gaps, MNE will add a 0-duration annotation with the description BAD_ACQ_SKIP. This is a special keyword, for example the filtering functions use it to avoid filtering over data discontinuities.

I think that if we place BAD_ACQ_SKIP annotations where needed when reading EDF+D files, then you should be able to safely work with the data in MNE.

Can you fork the mne-python repository, create a new branch and see if you can easily add a BAD_ACQ_SKIP annotation at the time of these gaps? Then you can create a little test with your test data, which should pass on your branch but fail on main.

scott-huberty avatar Sep 25 '25 22:09 scott-huberty

For other File formats, when there are acquisition gaps, MNE will add a 0-duration annotation with the description BAD_ACQ_SKIP.

IIUC this won't be enough. It will be an improvement, but I think you'll lose the time domain. Might need to do something similar to what @scott-huberty did with Eyelink: fill in a big span of nans across all channels to spoof the missing samples in each gap.

drammock avatar Sep 25 '25 22:09 drammock