mne-icalabel Discussion about the API for mne

I would like to discuss a bit what the API would look like, both on the mne_icalabel side and on the mne side. Here is my point of view:

`mne_icalabel`

As described in the readme, this package is not only a port of the popular ICLabel network for EEG but could host other models for EEG, MEG, and iEEG. Thus, I would structure the package as:

mne_icalabel
└─ iclabel
    └─ assets
    └─ tests
    └─ network.py
    └─ features.py
└─ model2
└─ model3
└─ ...
└─ icalabel.py

The main entry point from a user perspective would be in mne_icalabel/icalabel.py through a function that takes as input a raw or epochs instance, an ICA decomposition, and a method (str). The method argument would have a couple of accepted values: the name of the models, e.g. 'iclabel'; and would be responsible for selecting the model used. Based on the model used, the number of class in which the components are classified may vary. But in any case, the output should be the label (which class) and the probability that each component is in each class (scores).

Additionally to this main entry point, I would keep public a function to retrieve the features of a model, and a function to run a set of features through the model. e.g. for ICLabel, it would be mne_icalabel/iclabel/features.py:get_features and mne_icalabel/iclabel/network.py:run_iclabel. Everything else would become private.

`mne`

In version 1.1, there are already 4 methods to find bad components in mne.preprocessing.ICA: find_bads_ecg, find_bads_eog, find_bads_muscle and find_bads_ref. Following this pattern, I would go for a find_bads method that would wrap around the entry point in mne_icalabel/icalabel.py, with the 2 parameters inst and method. As for the 4 other methods, the label can be stored in ica.labels_.

I think it would be worth also storing the probability that each component is in each class (scores) in a similar attribute to ica.labels_, for both this new find_bads method and for all others as well. I brought the idea here a while ago: https://github.com/mne-tools/mne-python/issues/9846 but I didn't follow up on it yet. It does require a new ICA FIFF field for a 2D matrix of floats.

Finally, from an MNE perspective, the API could be simplified a bit further. At term, the find_bads method could be the only one remaining with the 4 others moved to mne_icalabel as models and called through find_bads with the correct method keyword specified.

Apr 28 '22 14:04 mscheltienne

mne_icalabel

As described in the readme, this package is not only a port of the popular ICLabel network for EEG but could host other models for EEG, MEG, and iEEG. Thus, I would structure the package as:
mne_icalabel
└─ iclabel
    └─ assets
    └─ tests
    └─ network.py
    └─ features.py
└─ model2
└─ model3
└─ ...
└─ icalabel.py

This is essentially what I had in mind, so very much agreed.

Additionally to this main entry point, I would keep public a function to retrieve the features of a model, and a function to run a set of features through the model. e.g. for ICLabel, it would be mne_icalabel/iclabel/features.py:get_features and mne_icalabel/iclabel/network.py:run_iclabel. Everything else would become private.

Yes I believe this is a good route as well, so that way we only keep a maintained API of what is necessary from a user/researcher perspective.

mne

In version 1.1, there are already 4 methods to find bad components in mne.preprocessing.ICA: find_bads_ecg, find_bads_eog, find_bads_muscle and find_bads_ref. Following this pattern, I would go for a find_bads method that would wrap around the entry point in mne_icalabel/icalabel.py, with the 2 parameters inst and method. As for the 4 other methods, the label can be stored in ica.labels_.

I believe these methods are all using some pattern matching, so yes I agree, we can make a PR in MNE-Python when this package is ported.

I think it would be worth also storing the probability that each component is in each class (scores) in a similar attribute to ica.labels_, for both this new find_bads method and for all others as well. I brought the idea here a while ago: mne-tools/mne-python#9846 but I didn't follow up on it yet. It does require a new ICA FIFF field for a 2D matrix of floats.

I've made a change to FIFF before, so I think this will be pretty straightforward.

Finally, from an MNE perspective, the API could be simplified a bit further. At term, the find_bads method could be the only one remaining with the 4 others moved to mne_icalabel as models and called through find_bads with the correct method keyword specified.

Agreed. Let's raise this point to MNE developers

^ I think to achieve all the above, after #18 is merged, we will need to do the following:

[x] setup circleCI to run docs building
[x] setup the API and documentations so there's only a "few" standard functions in the API, while the rest as you said is private functions
[ ] add additional complexity to an example/tutorial that shows running iclabel on simulations and sample datasets from MNE
[x] migrate this repository to mne.tools
[ ] then we make the corresponding PRs to MNE-Python

Then we can begin planning the next phase of improvements. I think the MNE community will be excited at this new feature.

Apr 28 '22 15:04 adam2392

I'm glad we had the same idea in mind. Those steps work for me and make perfect sense! 🔥

Apr 28 '22 15:04 mscheltienne

So I think in line with something like sklearn, we have the following semantics:

fit(): fits parameters to a specific dataset. The ICLabel model is "fixed", so the only thing I can think of doing here is running ICA in the backend.
predict_proba: predicts probabilities of each class
predict: predicts the numerical class output (e.g. 0 = brain)
transform(): applies transform to the input raw data by removing all ICA components that are not brain

This would need to be a class though. I can add this in #31

Besides this, I think a label_components function is still nice with the output being the class label(?). Right now it outputs the predicted probabilities.

Moreover, I think the only necessary input to any function is Raw/Epochs with ICA being optional because if it is not passed in, we run ICA in the background. Once this is settled, I think we have a "stable" v0.1

WDYT?

May 02 '22 13:05 adam2392

For the label_components functions, how about it returns 2 elements, the labels obtained with a winner-takes-all strategy and the predicted probabilities? The format array (n_components, n_classes) has merit in itself.. even if it needs some further processing to get the labels.

For running the ICA in the background, sure, we can run it by default with sensible parameters if it is not provided, or we can fit it if it's provided but 'unfitted'.

I am a bit worried the transformer is adding unnecessary complexity. As you say, the .fit() method does not add anything here. MNE already has a transformer for the ICA instance, and at term, I think it's better if people mostly run iclabel or other models through a method of the ICA instance, e.g. ica.label_components(method='iclabel').

I am also a bit worried that the winner-takes-all strategy is a bit risky. Using it by default to determine the class of each component is fine.. as long as the components who lost are not automatically added to ica.exclude. IMO, the responsibility to select the components to exclude and to apply the ICA should remain in the hands of the user.

I think it would be good to store the result of ica.label_components(method='iclabel') in attributes of ica (which can also be stored to FIFF). What form would those attributes take? If we look at the find_bads_exg methods, the scores (predicted probabilities) are not stored, while the labels are stored in self.labels_ as a dictionary with the class as key and an array of component IDx that were attributed to this class. For the predicted probabilities, I don't see a cleaner representation than the (n_components, n_class) array, or a dictionary with the class as key and a (n_components, ) array as value (could be a DataFrame, the goal is to store the class name alongside the predicted probabilities for each component). IMO, mne-icalabel main function outputs should take the same form as these attributes.

May 02 '22 14:05 mscheltienne

Ah I see, so the proposal here is to migrate this altogether into MNE, so that way it's structured within the info object or something.

I'm in favor of then making the output here in this package a dictionary as you said:

labels (corresponding string labels)
y_pred_proba (array of predicted probabilities)
y_pred (numerical labels)

The user can then pick the one they want. This can also be easily transformed to a data frame in one line. I guess prolly better not to return a data frame, so we don't introduce a hard-dependency on pandas.

Once this is done, sounds like the bulk of work requires going to MNE-Python.

May 02 '22 15:05 adam2392

Just notes for the steps to integrate the underlying data structures for the ICA class in MNE-Python:

Maintain a fork of FIFF-constants w/ constants we want to add:
- in the PR of MNE-Python, for fiff-constants update the branch temporarily to this fork to show tests pass
- in the MNE PR, converge on the API first of what the underlying labels_scores_ and labels_ looks like.

Check mne.constants.io.py look if any constants are close enough for the additional variables we want to save. For example, ICA annotation author via the existing constant FIFF_author could be used.

Aug 02 '22 18:08 adam2392

mne-icalabel
mne-icalabel copied to clipboard

Discussion about the API for mne_icalabel and mne

`mne_icalabel`

`mne`

`mne_icalabel`

`mne`

mne-icalabel mne-icalabel copied to clipboard

Discussion about the API for mne_icalabel and mne

mne_icalabel

mne

mne_icalabel

mne

mne-icalabel
mne-icalabel copied to clipboard

`mne_icalabel`

`mne`

`mne_icalabel`

`mne`