mne-python
mne-python copied to clipboard
Support STIM channels in EDF export
I found some issues with our recently added EDF export that we might want to fix before the next release:
- The error if
EDFlib-Pythonis not installed is misleading, because it saysRuntimeError: For exporting to EDF to work, the EDFlib module is needed, but it could not be imported. However, although the import name isEDFlib, the PyPI name is actuallyEDFlib-PythonandEDFlibis a different PyPI package (which requires Cython etc.). EDFlib-Pythonis super slow compared topyEDFlib(which uses the C implementation). I tested exporting with one of our BDF files from our lab (425MB), and it tookEDFlib-Python1 minute and 52 seconds to export. For comparison,pyEDFlibtook only 6 seconds.- Analog stim channels are not exported. This is essential in BDF files, because Biosemi amplifiers do not have discrete triggers and instead always write an analog stim channel. There is no technical reason to not export stim channels in addition to EEG (other than carefully setting the scaling, unit, and physical min/max). I've fixed this in my EDF/BDF export implementation in MNELAB recently (today) in https://github.com/cbrnr/mnelab/pull/230.
The first issue is probably easy to fix. The second issue is pretty serious, we can't expect that users wait almost 2 minutes for something that can be done in 6 seconds. I'd consider switching to pyEDFlib instead (they have binary wheels and AFAIR the restriction to use pure Python applies only to our project but not external dependencies). The third issue requires some work, but this is important for BDF files.
@cbrnr Would you have time to work on this next week?
Probably not next week, maybe the week after. I have to finish a paper revision and some other stuff. But we should decide if we want to replace EDFlib-Python with pyEDFlib – what is everyone's opinion?
It might make sense to do a quick profile and look at EDFlib-Python's code to see if it can easily be improved for example by using NumPy data writing functions rather than struct or anything like that. But I guess it also doesn't matter too much which lib we use, so I'm also fine with just switching to pyEDFlib if it's easy enough.
But we should decide if we want to replace
EDFlib-PythonwithpyEDFlib– what is everyone's opinion?
I'd pick whichever seems to work best at the moment … if you believe pyEDFlib is a good choice, let's choose this. It's on conda-forge too, so installation shouldn't be a problem.
I think the slowness comes from our own loop here:
https://github.com/mne-tools/mne-python/blob/main/mne/export/_edf.py#L240-L268
pyEDFlib doesn't require users to loop over blocks, because it accepts a list of NumPy arrays (one element per channel). I'm pretty sure this can even be simplified to passing the whole 2D array in one go.
if we can I would prefer to rely only on pure Python code. Compiled code requires more maintenance and binaries on many platforms.
my 2c
But it's an external package with wheels and an optional dependency. NumPy has tons of compiled code and we're not thinking about a pure Python alternative.
And since it's an optional dependency, we can simply replace it if / once it turns out to become problematic. It won't affect our core functionality in any way...
Exactly. I think in this case the speedup might be worth the tradeoff.
as long as someone does the maintenance I have no further objection
My only objection with pyedflib writing is that the end file never opened for me in EDFBrowser, which is my primary reason for working in EDF format. My export costs only occur once per file occasionally, so I"m okay with slow exports as long as they facilitate a downstream data analysis.
My export code at the time could've been incorrect tho.
Although maybe with the super fast data browser I've been peeking at, this could be eventually be irrelevant.
Exports with pyEDFlib worked fine when I tried it, all files opened in EDFbrowser with no problems.
Do you have a code snippet I can try?
This takes 30 sec, if I vectorize the loop over channels it takes 3 sec but produces an incorrect result (at least for my first attempt) -- probably need a new method to write multiple channels simultaneously
import mne
import numpy as np
info = mne.create_info(64, 1000., 'eeg')
data = np.random.RandomState(0).randn(64, 1000000)
raw = mne.io.RawArray(data, info)
raw.export('test.edf')
MNELAB exports using pyEDFlib: https://github.com/cbrnr/mnelab/blob/main/mnelab/io/writers.py#L46
MNELAB exports using pyEDFlib: https://github.com/cbrnr/mnelab/blob/main/mnelab/io/writers.py#L46
It seems you never explicitly close the file, is that unnecessary or actually forgotten?
Probably forgotten, thanks for the bug report 😄!
I would like to add a second EDF export function that uses pyEDFlib, so export works if either that package or EDFlib-Python is installed. Is this possible with our utility functions? What is the preferred way to do this inside _edf.py? Somehow I'd like to have an explicit option to choose the package instead of trying to automatically decide based on installed packages (e.g. what happens if both packages are installed?).
The discussion above makes it seem like it's fine just to switch to pyEDFlib, which seems preferable over having two code paths. We can always add the old code back later if it turns out to be problematic to use pyEDFlib for some reason.
OK, then I'll just replace it with my implementation.
And by "just" I thought that it was going to be as easy as using my existing function, but I don't have the time to adapt all the tests and whatnot (I just tried). The export module is so convoluted that I can't "just" use my existing function. So if anyone wants to take this over please knock yourself out.
@cbrnr Do you still have your function? I could try adapting the tests to it.
Yes, it's here: https://github.com/cbrnr/mnelab/blob/main/mnelab/io/writers.py#L57
Since it's been 1.5 years since I last tried to work on that, it might be worth checking how much functionality has been added to EDF export in MNE which is not present in my implementation.
@cbrnr I wrote something that works using pyedflib. One problem is that the final output differs more than the test allows (test is 1e-5):
absolute tolerance: 0.01
Not equal to tolerance rtol=0, atol=0.01
Mismatched elements: 8540 / 14400 (59.3%)
Max absolute difference: 0.02457271
Max relative difference: 0.00102393
x: array([0.000000e+00, 1.664960e-03, 3.329920e-03, ..., 2.397043e+01,
2.397210e+01, 2.397376e+01])
y: array([0.000000e+00, 1.666667e-03, 3.333333e-03, ..., 2.399500e+01,
2.399667e+01, 2.399833e+01])
absolute tolerance: 0.1
Success!
Is a tolerance of .1 enough? I am not sure why this is happening. It might have to do with the fact that the testing data has a non integer sampling rate and the physical maxes and mins are of high precision (pyedflib only accepts up to 4 decimal places).
Which test is this? It depends on the scale of the data, but even if that's µV, I think a tolerance of 0.1 is not ... tolerable 😄.
It is test_export_raw_edf. I saw in your code you used writeSamples. In mine, I replace the current write function with writePhysicalSamples. I am not sure why there is such a large difference in values...
In my own testing using pyedflib.highlevel.write_edf. I also got values within .1 uV. Is it all right if I make a pull request?
Yes, please submit a PR and we can discuss the tolerance issue there!
@cbrnr I made a PR whenever you have a chance to look :)
The first two issues have been fixed by switching to edfio. Now there's only one issue left, namely exporting STIM channels (I think MISC channels are already supported).