mne-python icon indicating copy to clipboard operation
mne-python copied to clipboard

[draft] new reader for curry files, using curry-python-reader code

Open dominikwelke opened this issue 9 months ago • 30 comments

hi all as discussed in #13033 here a draft PR to use the official curry reader code.

in a first step i just use the reader as a module (only fixed formatting to force it past pre-commit). it has some drawbacks (e.g. data is always loaded) and i did not implement all possible data yet (eg hpi, or epoched recordings) but in general it already works pretty well. tested it with all their example data, and one of my own recordings that didnt work in mne before.

it would be great to get some feedback how you want me to proceed with this @drammock @larsoner:

  • do we want to stick with the module approach, leave their code untouched and work with the output (would allow easier updating when they push changes)
  • or should i merge the code more thoroughly. making it easier to maintain and in terms of clarity

BACKGROUND: the curry data reader currently cant handle all/newer curry files plan is port code from the official curry reader into mne-python

for permission see #12855

closes #12795 #13033 #12855

dominikwelke avatar Mar 26 '25 20:03 dominikwelke

do we want to stick with the module approach, leave their code untouched and work with the output (would allow easier updating when they push changes), or should i merge the code more thoroughly. making it easier to maintain and in terms of clarity

Given that @CurryKaiser refused our offer to help them package up their reader for PyPI / conda-forge, I see two remaining options:

  1. "vendor" their code. To make it slightly future-proof, we could write a script (in tools/ I guess) that fetches the latest code from their repo, auto-formats it to make it compatible with MNE's pre-commit requirements, and puts the (formatted but otherwise unmodified) code in mne/io/curry/_vendored.py. (This is basically a manual version of git submodule update because I don't think we should invoke git submodule for this use case.) We then adapt our code in mne/io/curry.py to be a wrapper around their code that basically just gets things into proper MNE container objects; and know that we might need to tweak our wrappers any time the vendored code is updated.

  2. Fully incorporate their code. Re-write their reader to align better with our codebase, in terms of variable names, idioms like _check_option or _validate_type, features like preload=False, etc.

Personally I lean toward option 2. I say this because if we're going to try to support curry files, at a minimum we need to be able to fix bugs when they arise, and ideally we should be willing/able to incorporate new features that have high demand from our users (preload=False is an obvious first example). But if we're fixing bugs, do we open PRs to upstream (with no guarantee of responsiveness), or tweak our "thin" wrapper to handle more and more corner cases? Neither option is appealing, so at that point it starts to seem easier to me to just maintain the entire reader ourselves.

drammock avatar Mar 28 '25 14:03 drammock

hum... my first reaction is to push a version 0.1 of their package on pypi and rely on this. Basically we maintain a fork and hope that the fork changes are accepted upstream... it feels less hacky and they also have a CI and some testing setup with test_data that I would not duplicate in mne-python...

Message ID: @.***>

agramfort avatar Mar 28 '25 20:03 agramfort

hum... my first reaction is to push a version 0.1 of their package on pypi and rely on this. Basically we maintain a fork and hope that the fork changes are accepted upstream... it feels less hacky and they also have a CI and some testing setup with test_data that I would not duplicate in mne-python...

that indeed is less hacky than my approach to vendoring. I'd be OK with that outcome, though curious what @larsoner will think.

drammock avatar Mar 28 '25 20:03 drammock

I'm fine with that idea but it would be good to get some blessing/permission from them to do this

larsoner avatar Mar 29 '25 17:03 larsoner

@CurryKaiser

I'm fine with that idea but it would be good to get some blessing/permission from them to do this

xref to https://github.com/mne-tools/mne-python/issues/12855#issuecomment-2767203036 where I've asked for confirmation that Compumedics really doesn't want to be the packager and they're OK with us doing it.

drammock avatar Mar 31 '25 19:03 drammock

@CurryKaiser

I'm fine with that idea but it would be good to get some blessing/permission from them to do this

xref to #12855 (comment) where I've asked for confirmation that Compumedics really doesn't want to be the packager and they're OK with us doing it.

And nothing has changed, so all good from our side. Sorry we couldn't package it for you. And thank you for working on this!

CurryKaiser avatar Apr 01 '25 09:04 CurryKaiser

thanks @CurryKaiser !

ok, sounds like a plan.. I can start working on this again soon, if you give a go @agramfort @drammock @larsoner

I guess the fork should live in the mne-tools org? I have the necessary rights to create it

dominikwelke avatar Apr 01 '25 18:04 dominikwelke

Yeah I think so

larsoner avatar Apr 01 '25 18:04 larsoner

Yeah I think so

I already made the fork

drammock avatar Apr 01 '25 18:04 drammock

ok, sounds like a plan.. I can start working on this again soon, if you give a go @agramfort https://github.com/agramfort @drammock https://github.com/drammock @larsoner https://github.com/larsonerMessage ID: @.***>

+1

agramfort avatar Apr 01 '25 18:04 agramfort

xref to https://github.com/mne-tools/curry-python-reader/pull/1

drammock avatar Apr 02 '25 20:04 drammock

i could need some guidance on 2 things:

  1. channel locations: curry files come with channel locations, and for EEG it was straight forward to build a montage and apply. but for MEG it seems i need to use other functions. any pointers would help! do i need to populate info["dig"] directly?

  2. HPI/CHPI data: some MEG files seem to come with these data. how do i store them in the raw object?

a few other things to discuss:

  • preload easiest would be to not offer preload=False and just load the data to memory. a single load_data() call would also be doable with the official reader, but a chunk reader not really (if we dont want to hack it; e.g. load all data and discard large parts). not sure i'm deep enough in the mne codebase to know what the implications are (e.g. computations, plots etc with unloaded data) what are your thought?

  • epoched files the reader code looks as if there could be files with epoched recordings, but there are none among their sample files. do any of you know more about this? otherwise ill ask the curry devs

dominikwelke avatar Apr 02 '25 22:04 dominikwelke

p.s. and could you remind me how to switch off the CIs when pushing these early commits?

dominikwelke avatar Apr 02 '25 22:04 dominikwelke

Push commits with [ci skip] in the commit message and they long / expensive CIs shouldn't run (a few quick ones still will I think)

larsoner avatar Apr 03 '25 16:04 larsoner

... for the cHPI stuff it's probably easiest to load a Neuromag raw FIF file with cHPI info and look at how the info is stored for example in info["hpi_subsystem"]. You can also look at the Info docs, especially the notes. It's not complete but it will help.

For preload, since preload=False is in main it would be a functionality regression to remove it. Once you know how the data are stored on disk and how to read an arbitrary time slice from it, it's really not bad to do the work to make preload=False work. So if you can figure this part out in some function, I can help you fit it into the _read_segment_file code. Since the .py file is only a few hundred lines (a lot of which seems like plotting etc. that we won't use), I'm cautiously optimistic we can figure it out and make it work. And then the python-curry-reader code can really be for reading metadata, annotations, sensor locs, etc. plus knowing where to read the data from disk. We can probably even keep the existing _read_segment_file, it should work in theory...

larsoner avatar Apr 03 '25 16:04 larsoner

ok, _read_segment_file does indeed work unchanged. the reader should now be more/less functional

  • I'd still need some guidance on handling/storing the channel locations, esp. for MEG data
  • HPI data - looks like I got it wrong - there might not be cHPI data after all, only HPI marker locations provided in different formats depending on the system

dominikwelke avatar Apr 14 '25 11:04 dominikwelke

@CurryKaiser thanks for the permission to use the code, also from my side!

in another place you said you might be able to provide us with test files - could we perhaps get a small one with epoched recordings in it (format version shouldn't matter)? your repository for the python reader contains some test files that the reader interprets as epoched, but they dont seem to really be (perhaps the files were truncated for size)

dominikwelke avatar Apr 14 '25 11:04 dominikwelke

Could be that they were truncated, let me check.

CurryKaiser avatar Apr 15 '25 11:04 CurryKaiser

Ok, try these: EpochedData

CurryKaiser avatar Apr 15 '25 13:04 CurryKaiser

thanks for the file @CurryKaiser fyi, we have now packaged and published the curryreader on PyPI. it can be installed via pip install curryreader

dominikwelke avatar Apr 16 '25 16:04 dominikwelke

@drammock @larsoner @agramfort it is on PyPI but not on conda-forge - how is this case dealt with in MNE? should we also submit it to conda forge?

currently pip install mne[full] fetches it, but conda env create --file environment.yml doesnt

related question: which pip dependency level in pyproject.toml should this go to? i treated curryreader like the antio package (for ANT neuro files) but this makes it an optional requirement (in mne[full]). i believe this mean it wont be automatically installed when calling pip install mne?

dominikwelke avatar Apr 16 '25 16:04 dominikwelke

it is on PyPI but not on conda-forge - how is this case dealt with in MNE? should we also submit it to conda forge?

Yeah use grayskull, it's not too painful, see for example https://github.com/conda-forge/staged-recipes/pull/28279

larsoner avatar Apr 16 '25 16:04 larsoner

Yeah use grayskull,

see conda-forge PR: https://github.com/conda-forge/staged-recipes/pull/29754

dominikwelke avatar Apr 17 '25 09:04 dominikwelke

see conda-forge PR: conda-forge/staged-recipes#29754

now available via conda-forge

dominikwelke avatar May 09 '25 17:05 dominikwelke

@larsoner @drammock - could you point me to where i set the curryreader dependency for the failing pipelines Tests/ubuntu-latest/pip and mne-tools.mne-python? i tried a few places already, and all other pipelines fetch it..

the other fails are due to my code, i'll solve that down the line

dominikwelke avatar May 14 '25 18:05 dominikwelke

could you point me to where i set the curryreader dependency for the failing pipelines Tests/ubuntu-latest/pip and mne-tools.mne-python?

Before doing that... any test that requires curryreader should have a pytest.importorskip that skips the test rather than fails. So the easiest way to do it is actually to have a module-level

pytest.importorskip("curryreader")

in mne/io/curry/tests/test_curry.py.

... but then to actually add it in the "right places", I would use git grep mffpy and git grep edfio to see where we install those optional dependencies. Doing the same for curryreader should be good enough!

larsoner avatar May 14 '25 18:05 larsoner

i treated curryreader like the antio package

yeah, had already done this with antio and edfio. mffpy was a good lead, gave a new location. but mne-tools.mne-pythonstill fails

dominikwelke avatar May 15 '25 23:05 dominikwelke

I had some deadlines and will now be on vacation for a while. will continue working on the PR afterward

dominikwelke avatar May 21 '25 21:05 dominikwelke

users are asking for this; xref to forum post: https://mne.discourse.group/t/mne-io-read-raw-curry-for-curry-9/11245/2

drammock avatar Jun 12 '25 22:06 drammock

users are asking for this; xref to forum post: https://mne.discourse.group/t/mne-io-read-raw-curry-for-curry-9/11245/2

good to hear that. just came back from annual leave and can start working on it again

dominikwelke avatar Jun 16 '25 17:06 dominikwelke