mne-python
mne-python copied to clipboard
Support STIM channels in EDF export
I found some issues with our recently added EDF export that we might want to fix before the next release:
- The error if
EDFlib-Python
is not installed is misleading, because it saysRuntimeError: For exporting to EDF to work, the EDFlib module is needed, but it could not be imported
. However, although the import name isEDFlib
, the PyPI name is actuallyEDFlib-Python
andEDFlib
is a different PyPI package (which requires Cython etc.). -
EDFlib-Python
is super slow compared topyEDFlib
(which uses the C implementation). I tested exporting with one of our BDF files from our lab (425MB), and it tookEDFlib-Python
1 minute and 52 seconds to export. For comparison,pyEDFlib
took only 6 seconds. - Analog stim channels are not exported. This is essential in BDF files, because Biosemi amplifiers do not have discrete triggers and instead always write an analog stim channel. There is no technical reason to not export stim channels in addition to EEG (other than carefully setting the scaling, unit, and physical min/max). I've fixed this in my EDF/BDF export implementation in MNELAB recently (today) in https://github.com/cbrnr/mnelab/pull/230.
The first issue is probably easy to fix. The second issue is pretty serious, we can't expect that users wait almost 2 minutes for something that can be done in 6 seconds. I'd consider switching to pyEDFlib
instead (they have binary wheels and AFAIR the restriction to use pure Python applies only to our project but not external dependencies). The third issue requires some work, but this is important for BDF files.
@cbrnr Would you have time to work on this next week?
Probably not next week, maybe the week after. I have to finish a paper revision and some other stuff. But we should decide if we want to replace EDFlib-Python
with pyEDFlib
– what is everyone's opinion?
It might make sense to do a quick profile and look at EDFlib-Python's code to see if it can easily be improved for example by using NumPy data writing functions rather than struct
or anything like that. But I guess it also doesn't matter too much which lib we use, so I'm also fine with just switching to pyEDFlib
if it's easy enough.
But we should decide if we want to replace
EDFlib-Python
withpyEDFlib
– what is everyone's opinion?
I'd pick whichever seems to work best at the moment … if you believe pyEDFlib
is a good choice, let's choose this. It's on conda-forge
too, so installation shouldn't be a problem.
I think the slowness comes from our own loop here:
https://github.com/mne-tools/mne-python/blob/main/mne/export/_edf.py#L240-L268
pyEDFlib
doesn't require users to loop over blocks, because it accepts a list of NumPy arrays (one element per channel). I'm pretty sure this can even be simplified to passing the whole 2D array in one go.
if we can I would prefer to rely only on pure Python code. Compiled code requires more maintenance and binaries on many platforms.
my 2c
But it's an external package with wheels and an optional dependency. NumPy has tons of compiled code and we're not thinking about a pure Python alternative.
And since it's an optional dependency, we can simply replace it if / once it turns out to become problematic. It won't affect our core functionality in any way...
Exactly. I think in this case the speedup might be worth the tradeoff.
as long as someone does the maintenance I have no further objection
My only objection with pyedflib writing is that the end file never opened for me in EDFBrowser, which is my primary reason for working in EDF format. My export costs only occur once per file occasionally, so I"m okay with slow exports as long as they facilitate a downstream data analysis.
My export code at the time could've been incorrect tho.
Although maybe with the super fast data browser I've been peeking at, this could be eventually be irrelevant.
Exports with pyEDFlib worked fine when I tried it, all files opened in EDFbrowser with no problems.
Do you have a code snippet I can try?
This takes 30 sec, if I vectorize the loop over channels it takes 3 sec but produces an incorrect result (at least for my first attempt) -- probably need a new method to write multiple channels simultaneously
import mne
import numpy as np
info = mne.create_info(64, 1000., 'eeg')
data = np.random.RandomState(0).randn(64, 1000000)
raw = mne.io.RawArray(data, info)
raw.export('test.edf')
MNELAB exports using pyEDFlib: https://github.com/cbrnr/mnelab/blob/main/mnelab/io/writers.py#L46
MNELAB exports using pyEDFlib: https://github.com/cbrnr/mnelab/blob/main/mnelab/io/writers.py#L46
It seems you never explicitly close the file, is that unnecessary or actually forgotten?
Probably forgotten, thanks for the bug report 😄!
I would like to add a second EDF export function that uses pyEDFlib
, so export works if either that package or EDFlib-Python
is installed. Is this possible with our utility functions? What is the preferred way to do this inside _edf.py
? Somehow I'd like to have an explicit option to choose the package instead of trying to automatically decide based on installed packages (e.g. what happens if both packages are installed?).
The discussion above makes it seem like it's fine just to switch to pyEDFlib
, which seems preferable over having two code paths. We can always add the old code back later if it turns out to be problematic to use pyEDFlib
for some reason.
OK, then I'll just replace it with my implementation.
And by "just" I thought that it was going to be as easy as using my existing function, but I don't have the time to adapt all the tests and whatnot (I just tried). The export module is so convoluted that I can't "just" use my existing function. So if anyone wants to take this over please knock yourself out.