mne-python icon indicating copy to clipboard operation
mne-python copied to clipboard

Support STIM channels in EDF export

Open cbrnr opened this issue 2 years ago • 29 comments

I found some issues with our recently added EDF export that we might want to fix before the next release:

  1. The error if EDFlib-Python is not installed is misleading, because it says RuntimeError: For exporting to EDF to work, the EDFlib module is needed, but it could not be imported. However, although the import name is EDFlib, the PyPI name is actually EDFlib-Python and EDFlib is a different PyPI package (which requires Cython etc.).
  2. EDFlib-Python is super slow compared to pyEDFlib (which uses the C implementation). I tested exporting with one of our BDF files from our lab (425MB), and it took EDFlib-Python 1 minute and 52 seconds to export. For comparison, pyEDFlib took only 6 seconds.
  3. Analog stim channels are not exported. This is essential in BDF files, because Biosemi amplifiers do not have discrete triggers and instead always write an analog stim channel. There is no technical reason to not export stim channels in addition to EEG (other than carefully setting the scaling, unit, and physical min/max). I've fixed this in my EDF/BDF export implementation in MNELAB recently (today) in https://github.com/cbrnr/mnelab/pull/230.

The first issue is probably easy to fix. The second issue is pretty serious, we can't expect that users wait almost 2 minutes for something that can be done in 6 seconds. I'd consider switching to pyEDFlib instead (they have binary wheels and AFAIR the restriction to use pure Python applies only to our project but not external dependencies). The third issue requires some work, but this is important for BDF files.

cbrnr avatar Oct 21 '21 13:10 cbrnr

@cbrnr Would you have time to work on this next week?

hoechenberger avatar Oct 22 '21 17:10 hoechenberger

Probably not next week, maybe the week after. I have to finish a paper revision and some other stuff. But we should decide if we want to replace EDFlib-Python with pyEDFlib – what is everyone's opinion?

cbrnr avatar Oct 22 '21 17:10 cbrnr

It might make sense to do a quick profile and look at EDFlib-Python's code to see if it can easily be improved for example by using NumPy data writing functions rather than struct or anything like that. But I guess it also doesn't matter too much which lib we use, so I'm also fine with just switching to pyEDFlib if it's easy enough.

larsoner avatar Oct 22 '21 17:10 larsoner

But we should decide if we want to replace EDFlib-Python with pyEDFlib – what is everyone's opinion?

I'd pick whichever seems to work best at the moment … if you believe pyEDFlib is a good choice, let's choose this. It's on conda-forge too, so installation shouldn't be a problem.

hoechenberger avatar Oct 22 '21 17:10 hoechenberger

I think the slowness comes from our own loop here:

https://github.com/mne-tools/mne-python/blob/main/mne/export/_edf.py#L240-L268

pyEDFlib doesn't require users to loop over blocks, because it accepts a list of NumPy arrays (one element per channel). I'm pretty sure this can even be simplified to passing the whole 2D array in one go.

cbrnr avatar Oct 23 '21 10:10 cbrnr

if we can I would prefer to rely only on pure Python code. Compiled code requires more maintenance and binaries on many platforms.

my 2c

agramfort avatar Oct 23 '21 17:10 agramfort

But it's an external package with wheels and an optional dependency. NumPy has tons of compiled code and we're not thinking about a pure Python alternative.

cbrnr avatar Oct 23 '21 19:10 cbrnr

And since it's an optional dependency, we can simply replace it if / once it turns out to become problematic. It won't affect our core functionality in any way...

hoechenberger avatar Oct 24 '21 07:10 hoechenberger

Exactly. I think in this case the speedup might be worth the tradeoff.

cbrnr avatar Oct 24 '21 07:10 cbrnr

as long as someone does the maintenance I have no further objection

agramfort avatar Oct 24 '21 07:10 agramfort

My only objection with pyedflib writing is that the end file never opened for me in EDFBrowser, which is my primary reason for working in EDF format. My export costs only occur once per file occasionally, so I"m okay with slow exports as long as they facilitate a downstream data analysis.

My export code at the time could've been incorrect tho.

Although maybe with the super fast data browser I've been peeking at, this could be eventually be irrelevant.

adam2392 avatar Oct 26 '21 16:10 adam2392

Exports with pyEDFlib worked fine when I tried it, all files opened in EDFbrowser with no problems.

cbrnr avatar Oct 26 '21 18:10 cbrnr

Do you have a code snippet I can try?

adam2392 avatar Oct 26 '21 18:10 adam2392

This takes 30 sec, if I vectorize the loop over channels it takes 3 sec but produces an incorrect result (at least for my first attempt) -- probably need a new method to write multiple channels simultaneously

import mne
import numpy as np
info = mne.create_info(64, 1000., 'eeg')
data = np.random.RandomState(0).randn(64, 1000000)
raw = mne.io.RawArray(data, info)
raw.export('test.edf')

larsoner avatar Oct 26 '21 19:10 larsoner

MNELAB exports using pyEDFlib: https://github.com/cbrnr/mnelab/blob/main/mnelab/io/writers.py#L46

cbrnr avatar Oct 26 '21 19:10 cbrnr

MNELAB exports using pyEDFlib: https://github.com/cbrnr/mnelab/blob/main/mnelab/io/writers.py#L46

It seems you never explicitly close the file, is that unnecessary or actually forgotten?

hoechenberger avatar Oct 26 '21 19:10 hoechenberger

Probably forgotten, thanks for the bug report 😄!

cbrnr avatar Oct 26 '21 19:10 cbrnr

I would like to add a second EDF export function that uses pyEDFlib, so export works if either that package or EDFlib-Python is installed. Is this possible with our utility functions? What is the preferred way to do this inside _edf.py? Somehow I'd like to have an explicit option to choose the package instead of trying to automatically decide based on installed packages (e.g. what happens if both packages are installed?).

cbrnr avatar Apr 26 '22 13:04 cbrnr

The discussion above makes it seem like it's fine just to switch to pyEDFlib, which seems preferable over having two code paths. We can always add the old code back later if it turns out to be problematic to use pyEDFlib for some reason.

larsoner avatar Apr 26 '22 13:04 larsoner

OK, then I'll just replace it with my implementation.

cbrnr avatar Apr 26 '22 13:04 cbrnr

And by "just" I thought that it was going to be as easy as using my existing function, but I don't have the time to adapt all the tests and whatnot (I just tried). The export module is so convoluted that I can't "just" use my existing function. So if anyone wants to take this over please knock yourself out.

cbrnr avatar Apr 26 '22 14:04 cbrnr