mne-icalabel
mne-icalabel copied to clipboard
Discussion about the API for mne_icalabel and mne
I would like to discuss a bit what the API would look like, both on the mne_icalabel
side and on the mne
side. Here is my point of view:
mne_icalabel
As described in the readme, this package is not only a port of the popular ICLabel
network for EEG but could host other models for EEG, MEG, and iEEG. Thus, I would structure the package as:
mne_icalabel
└─ iclabel
└─ assets
└─ tests
└─ network.py
└─ features.py
└─ model2
└─ model3
└─ ...
└─ icalabel.py
The main entry point from a user perspective would be in mne_icalabel/icalabel.py
through a function that takes as input a raw
or epochs
instance, an ICA decomposition, and a method (str). The method
argument would have a couple of accepted values: the name of the models, e.g. 'iclabel'
; and would be responsible for selecting the model used.
Based on the model used, the number of class in which the components are classified may vary. But in any case, the output should be the label (which class) and the probability that each component is in each class (scores).
Additionally to this main entry point, I would keep public a function to retrieve the features of a model, and a function to run a set of features through the model. e.g. for ICLabel, it would be mne_icalabel/iclabel/features.py:get_features
and mne_icalabel/iclabel/network.py:run_iclabel
. Everything else would become private.
mne
In version 1.1, there are already 4 methods to find bad components in mne.preprocessing.ICA
: find_bads_ecg
, find_bads_eog
, find_bads_muscle
and find_bads_ref
. Following this pattern, I would go for a find_bads
method that would wrap around the entry point in mne_icalabel/icalabel.py
, with the 2 parameters inst
and method
. As for the 4 other methods, the label can be stored in ica.labels_
.
I think it would be worth also storing the probability that each component is in each class (scores) in a similar attribute to ica.labels_
, for both this new find_bads
method and for all others as well.
I brought the idea here a while ago: https://github.com/mne-tools/mne-python/issues/9846 but I didn't follow up on it yet. It does require a new ICA FIFF field for a 2D matrix of floats.
Finally, from an MNE perspective, the API could be simplified a bit further. At term, the find_bads
method could be the only one remaining with the 4 others moved to mne_icalabel
as models and called through find_bads
with the correct method keyword specified.
mne_icalabel
As described in the readme, this package is not only a port of the popular
ICLabel
network for EEG but could host other models for EEG, MEG, and iEEG. Thus, I would structure the package as:mne_icalabel └─ iclabel └─ assets └─ tests └─ network.py └─ features.py └─ model2 └─ model3 └─ ... └─ icalabel.py
This is essentially what I had in mind, so very much agreed.
Additionally to this main entry point, I would keep public a function to retrieve the features of a model, and a function to run a set of features through the model. e.g. for ICLabel, it would be
mne_icalabel/iclabel/features.py:get_features
andmne_icalabel/iclabel/network.py:run_iclabel
. Everything else would become private.
Yes I believe this is a good route as well, so that way we only keep a maintained API of what is necessary from a user/researcher perspective.
mne
In version 1.1, there are already 4 methods to find bad components in
mne.preprocessing.ICA
:find_bads_ecg
,find_bads_eog
,find_bads_muscle
andfind_bads_ref
. Following this pattern, I would go for afind_bads
method that would wrap around the entry point inmne_icalabel/icalabel.py
, with the 2 parametersinst
andmethod
. As for the 4 other methods, the label can be stored inica.labels_
.
I believe these methods are all using some pattern matching, so yes I agree, we can make a PR in MNE-Python when this package is ported.
I think it would be worth also storing the probability that each component is in each class (scores) in a similar attribute to
ica.labels_
, for both this newfind_bads
method and for all others as well. I brought the idea here a while ago: mne-tools/mne-python#9846 but I didn't follow up on it yet. It does require a new ICA FIFF field for a 2D matrix of floats.
I've made a change to FIFF before, so I think this will be pretty straightforward.
Finally, from an MNE perspective, the API could be simplified a bit further. At term, the
find_bads
method could be the only one remaining with the 4 others moved tomne_icalabel
as models and called throughfind_bads
with the correct method keyword specified.
Agreed. Let's raise this point to MNE developers
^ I think to achieve all the above, after #18 is merged, we will need to do the following:
- [x] setup circleCI to run docs building
- [x] setup the API and documentations so there's only a "few" standard functions in the API, while the rest as you said is private functions
- [ ] add additional complexity to an example/tutorial that shows running iclabel on simulations and sample datasets from MNE
- [x] migrate this repository to
mne.tools
- [ ] then we make the corresponding PRs to MNE-Python
Then we can begin planning the next phase of improvements. I think the MNE community will be excited at this new feature.
I'm glad we had the same idea in mind. Those steps work for me and make perfect sense! 🔥
So I think in line with something like sklearn, we have the following semantics:
-
fit()
: fits parameters to a specific dataset. The ICLabel model is "fixed", so the only thing I can think of doing here is running ICA in the backend. -
predict_proba
: predicts probabilities of each class -
predict
: predicts the numerical class output (e.g. 0 = brain) -
transform()
: applies transform to the input raw data by removing all ICA components that are not brain
This would need to be a class though. I can add this in #31
Besides this, I think a label_components
function is still nice with the output being the class label(?). Right now it outputs the predicted probabilities.
Moreover, I think the only necessary input to any function is Raw/Epochs
with ICA being optional because if it is not passed in, we run ICA in the background. Once this is settled, I think we have a "stable" v0.1
WDYT?
For the label_components
functions, how about it returns 2 elements, the labels obtained with a winner-takes-all strategy and the predicted probabilities? The format array (n_components, n_classes)
has merit in itself.. even if it needs some further processing to get the labels.
For running the ICA in the background, sure, we can run it by default with sensible parameters if it is not provided, or we can fit it if it's provided but 'unfitted'
.
I am a bit worried the transformer is adding unnecessary complexity. As you say, the .fit()
method does not add anything here. MNE already has a transformer for the ICA instance, and at term, I think it's better if people mostly run iclabel
or other models through a method of the ICA instance, e.g. ica.label_components(method='iclabel')
.
I am also a bit worried that the winner-takes-all strategy is a bit risky. Using it by default to determine the class of each component is fine.. as long as the components who lost are not automatically added to ica.exclude
.
IMO, the responsibility to select the components to exclude and to apply the ICA should remain in the hands of the user.
I think it would be good to store the result of ica.label_components(method='iclabel')
in attributes of ica
(which can also be stored to FIFF). What form would those attributes take? If we look at the find_bads_exg
methods, the scores (predicted probabilities) are not stored, while the labels are stored in self.labels_
as a dictionary with the class as key and an array of component IDx that were attributed to this class.
For the predicted probabilities, I don't see a cleaner representation than the (n_components, n_class)
array, or a dictionary with the class as key and a (n_components, )
array as value (could be a DataFrame, the goal is to store the class name alongside the predicted probabilities for each component).
IMO, mne-icalabel
main function outputs should take the same form as these attributes.
Ah I see, so the proposal here is to migrate this altogether into MNE, so that way it's structured within the info
object or something.
I'm in favor of then making the output here in this package a dictionary as you said:
- labels (corresponding string labels)
- y_pred_proba (array of predicted probabilities)
- y_pred (numerical labels)
The user can then pick the one they want. This can also be easily transformed to a data frame in one line. I guess prolly better not to return a data frame, so we don't introduce a hard-dependency on pandas.
Once this is done, sounds like the bulk of work requires going to MNE-Python.
Just notes for the steps to integrate the underlying data structures for the ICA
class in MNE-Python:
- Maintain a fork of FIFF-constants w/ constants we want to add:
- in the PR of MNE-Python, for fiff-constants update the branch temporarily to this fork to show tests pass
- in the MNE PR, converge on the API first of what the underlying
labels_scores_
andlabels_
looks like.
Check mne.constants.io.py look if any constants are close enough for the additional variables we want to save. For example, ICA annotation author via the existing constant FIFF_author
could be used.