mne-python Support STIM channels in EDF export

trafficstars

I found some issues with our recently added EDF export that we might want to fix before the next release:

The error if EDFlib-Python is not installed is misleading, because it says RuntimeError: For exporting to EDF to work, the EDFlib module is needed, but it could not be imported. However, although the import name is EDFlib, the PyPI name is actually EDFlib-Python and EDFlib is a different PyPI package (which requires Cython etc.).
EDFlib-Python is super slow compared to pyEDFlib (which uses the C implementation). I tested exporting with one of our BDF files from our lab (425MB), and it took EDFlib-Python 1 minute and 52 seconds to export. For comparison, pyEDFlib took only 6 seconds.
Analog stim channels are not exported. This is essential in BDF files, because Biosemi amplifiers do not have discrete triggers and instead always write an analog stim channel. There is no technical reason to not export stim channels in addition to EEG (other than carefully setting the scaling, unit, and physical min/max). I've fixed this in my EDF/BDF export implementation in MNELAB recently (today) in https://github.com/cbrnr/mnelab/pull/230.

The first issue is probably easy to fix. The second issue is pretty serious, we can't expect that users wait almost 2 minutes for something that can be done in 6 seconds. I'd consider switching to pyEDFlib instead (they have binary wheels and AFAIR the restriction to use pure Python applies only to our project but not external dependencies). The third issue requires some work, but this is important for BDF files.

Oct 21 '21 13:10 cbrnr

@cbrnr Would you have time to work on this next week?

Oct 22 '21 17:10 hoechenberger

Probably not next week, maybe the week after. I have to finish a paper revision and some other stuff. But we should decide if we want to replace EDFlib-Python with pyEDFlib – what is everyone's opinion?

Oct 22 '21 17:10 cbrnr

It might make sense to do a quick profile and look at EDFlib-Python's code to see if it can easily be improved for example by using NumPy data writing functions rather than struct or anything like that. But I guess it also doesn't matter too much which lib we use, so I'm also fine with just switching to pyEDFlib if it's easy enough.

Oct 22 '21 17:10 larsoner

But we should decide if we want to replace EDFlib-Python with pyEDFlib – what is everyone's opinion?

I'd pick whichever seems to work best at the moment … if you believe pyEDFlib is a good choice, let's choose this. It's on conda-forge too, so installation shouldn't be a problem.

Oct 22 '21 17:10 hoechenberger

I think the slowness comes from our own loop here:

https://github.com/mne-tools/mne-python/blob/main/mne/export/_edf.py#L240-L268

pyEDFlib doesn't require users to loop over blocks, because it accepts a list of NumPy arrays (one element per channel). I'm pretty sure this can even be simplified to passing the whole 2D array in one go.

Oct 23 '21 10:10 cbrnr

if we can I would prefer to rely only on pure Python code. Compiled code requires more maintenance and binaries on many platforms.

my 2c

Oct 23 '21 17:10 agramfort

But it's an external package with wheels and an optional dependency. NumPy has tons of compiled code and we're not thinking about a pure Python alternative.

Oct 23 '21 19:10 cbrnr

And since it's an optional dependency, we can simply replace it if / once it turns out to become problematic. It won't affect our core functionality in any way...

Oct 24 '21 07:10 hoechenberger

Exactly. I think in this case the speedup might be worth the tradeoff.

Oct 24 '21 07:10 cbrnr

as long as someone does the maintenance I have no further objection

Oct 24 '21 07:10 agramfort

My only objection with pyedflib writing is that the end file never opened for me in EDFBrowser, which is my primary reason for working in EDF format. My export costs only occur once per file occasionally, so I"m okay with slow exports as long as they facilitate a downstream data analysis.

My export code at the time could've been incorrect tho.

Although maybe with the super fast data browser I've been peeking at, this could be eventually be irrelevant.

Oct 26 '21 16:10 adam2392

Exports with pyEDFlib worked fine when I tried it, all files opened in EDFbrowser with no problems.

Oct 26 '21 18:10 cbrnr

Do you have a code snippet I can try?

Oct 26 '21 18:10 adam2392

This takes 30 sec, if I vectorize the loop over channels it takes 3 sec but produces an incorrect result (at least for my first attempt) -- probably need a new method to write multiple channels simultaneously

import mne
import numpy as np
info = mne.create_info(64, 1000., 'eeg')
data = np.random.RandomState(0).randn(64, 1000000)
raw = mne.io.RawArray(data, info)
raw.export('test.edf')

Oct 26 '21 19:10 larsoner

MNELAB exports using pyEDFlib: https://github.com/cbrnr/mnelab/blob/main/mnelab/io/writers.py#L46

Oct 26 '21 19:10 cbrnr

MNELAB exports using pyEDFlib: https://github.com/cbrnr/mnelab/blob/main/mnelab/io/writers.py#L46

It seems you never explicitly close the file, is that unnecessary or actually forgotten?

Oct 26 '21 19:10 hoechenberger

Probably forgotten, thanks for the bug report 😄!

Oct 26 '21 19:10 cbrnr

I would like to add a second EDF export function that uses pyEDFlib, so export works if either that package or EDFlib-Python is installed. Is this possible with our utility functions? What is the preferred way to do this inside _edf.py? Somehow I'd like to have an explicit option to choose the package instead of trying to automatically decide based on installed packages (e.g. what happens if both packages are installed?).

Apr 26 '22 13:04 cbrnr

The discussion above makes it seem like it's fine just to switch to pyEDFlib, which seems preferable over having two code paths. We can always add the old code back later if it turns out to be problematic to use pyEDFlib for some reason.

Apr 26 '22 13:04 larsoner

OK, then I'll just replace it with my implementation.

Apr 26 '22 13:04 cbrnr

And by "just" I thought that it was going to be as easy as using my existing function, but I don't have the time to adapt all the tests and whatnot (I just tried). The export module is so convoluted that I can't "just" use my existing function. So if anyone wants to take this over please knock yourself out.

Apr 26 '22 14:04 cbrnr

@cbrnr Do you still have your function? I could try adapting the tests to it.

Sep 30 '23 19:09 withmywoessner

Yes, it's here: https://github.com/cbrnr/mnelab/blob/main/mnelab/io/writers.py#L57

Since it's been 1.5 years since I last tried to work on that, it might be worth checking how much functionality has been added to EDF export in MNE which is not present in my implementation.

Oct 01 '23 11:10 cbrnr

@cbrnr I wrote something that works using pyedflib. One problem is that the final output differs more than the test allows (test is 1e-5):

absolute tolerance: 0.01

Not equal to tolerance rtol=0, atol=0.01

Mismatched elements: 8540 / 14400 (59.3%)
Max absolute difference: 0.02457271
Max relative difference: 0.00102393
 x: array([0.000000e+00, 1.664960e-03, 3.329920e-03, ..., 2.397043e+01,
       2.397210e+01, 2.397376e+01])
 y: array([0.000000e+00, 1.666667e-03, 3.333333e-03, ..., 2.399500e+01,
       2.399667e+01, 2.399833e+01])

absolute tolerance: 0.1
Success!

Is a tolerance of .1 enough? I am not sure why this is happening. It might have to do with the fact that the testing data has a non integer sampling rate and the physical maxes and mins are of high precision (pyedflib only accepts up to 4 decimal places).

Oct 04 '23 18:10 withmywoessner

Which test is this? It depends on the scale of the data, but even if that's µV, I think a tolerance of 0.1 is not ... tolerable 😄.

Oct 05 '23 08:10 cbrnr

It is test_export_raw_edf. I saw in your code you used writeSamples. In mine, I replace the current write function with writePhysicalSamples. I am not sure why there is such a large difference in values... In my own testing using pyedflib.highlevel.write_edf. I also got values within .1 uV. Is it all right if I make a pull request?

Oct 05 '23 09:10 withmywoessner

Yes, please submit a PR and we can discuss the tolerance issue there!

Oct 05 '23 12:10 cbrnr

@cbrnr I made a PR whenever you have a chance to look :)

Oct 09 '23 01:10 withmywoessner

The first two issues have been fixed by switching to edfio. Now there's only one issue left, namely exporting STIM channels (I think MISC channels are already supported).

Jan 03 '24 20:01 cbrnr

mne-python mne-python copied to clipboard

Support STIM channels in EDF export

mne-python
mne-python copied to clipboard