mne-bids icon indicating copy to clipboard operation
mne-bids copied to clipboard

Automated artifact detection

Open hoechenberger opened this issue 3 years ago • 21 comments

This is just to inform you of some plans I have regarding ways to help users mark channels and time periods as bad by providing a pre-selection based on automated artifact detection in the MNE-BIDS Inspector.

High-priority:

  • [ ] detect noisy channels (MEG)
  • [x] detect flat channels (all sensor types)
  • [ ] detect muscle artifacts (all sensor types)

Noisy and flat channels would be marked as bad, and time periods with muscle artifacts would get annotated with a BAD_ prefix.

Further I was thinking about:

  • [ ] detect blinks
  • [ ] detect heart beats
  • [ ] detect flat segments (all sensor types)

Here, we should not annotate the segments as bad.


As a more general question: I'm not sure at which stage we'd cross the line from "we're just annotating the raw data properly" into "we're creating a derivative" territory. What's your opinion?

hoechenberger avatar Nov 29 '20 12:11 hoechenberger

This sounds immensely helpful!

AFAIK the MNE detect_flat_chs function also marks segments ... and only if there are too many segments flat, then the whole channel is marked as flat. How did you want to deal with that?

"we're just annotating the raw data properly" into "we're creating a derivative" territory. What's your opinion?

as long as what we are producing could be saved in an MNE-style annotations file (or events tsv), or a list of bad channels as in channels.tsv, I think that'd we'd be fine and within RAW domain.

sappelhoff avatar Nov 29 '20 13:11 sappelhoff

AFAIK the MNE detect_flat_chs function also marks segments ... and only if there are too many segments flat, then the whole channel is marked as flat. How did you want to deal with that?

Yes, so this is a little annoying because it again exposes a limitation in both, MNE's Annotations and BIDS, in that they don't support temporal markers that apply to only a subset of channels.

My idea for annotate_flat() would be based on the following algorithm:

  • run it with bad_percent set to >100%, so that no channel will get marked as bad, but we'll get annotations for all flat segments. We don't attach these annotations to the data yet.
  • mark all channels as flat that have flat-annotated segments in 5% of their data – this is the default threshold that annotate_flat() uses
  • attach the remaining annotations to the data, but don't prefix them with BAD_, so MNE won't auto-ignore those time periods. This will allow users to repair the affected channels e.g. using autoreject, without losing the data from the other channels in that time period

But maybe my thinking is too complicated..? WDYT? @agramfort @jasmainak?

as long as what we are producing could be saved in an MNE-style annotations file (or events tsv), or a list of bad channels as in channels.tsv, I think that'd we'd be fine and within RAW domain.

I was also thinking about this again and I agree, I think as long as we're only touching the sidecars – and not the raw data per se – it should all be good, as we're essentially only amending the documentation / description of the data.

hoechenberger avatar Nov 29 '20 14:11 hoechenberger

  • mark all channels as flat that have flat-annotated segments in 5% of their data

Oh wait, that won't work, since the function doesn't return the individual scores. So we'd have to modify annotate_flat(). find_bad_channels_maxwell(), for example, has a return_scores switch for that specific reason.

hoechenberger avatar Nov 29 '20 14:11 hoechenberger

I just want to sound a note of caution regarding writing annotations that are outputs of automated algorithm. Long ago, after we made the first draft of BEP021, someone actually created an example dataset here: https://github.com/bids-standard/bids-examples/pull/161/files. The output annotation files looked horribly complicated to me. There is also some work coming from EEGLAB folks that you might want to check out (probably mentioned in the other PR ... or maybe it was this work by James Desjardins). It's good to be aware of these efforts. A grant was also submitted recently to push derivatives, but I'm not sure what became of it. If you have bandwidth and want to be part of the effort, I am happy to loop you in. I do not have that much time these days but I would encourage a community effort so that BIDS does not become the monopoly of one software package.

jasmainak avatar Nov 30 '20 03:11 jasmainak

Thanks @jasmainak! Yes definitely do loop me in if you can :)

At the most basic level, I think we can agree that it would be good to have automated bad channel detection (which can be integrated into / followed by manual inspection e.g. via MNE-BIDS Inspect).

Regarding the example dataset you shared: It seems to me they added all kinds of columns and scores etc to the BIDS data. This is not what I had in mind. My main idea was really just to detect bad channels and mark some very common artifacts; but not to provide any statistics that lead to those decisions. No new column types, no fancy stuff.

The one thing that we would need to standardize is annotating bad segments of data in the time domain, that's not included in the current BIDS specification.

But this could also be avoided:

  • use MNE's flat and noisy channel detections to mark bad channels -> don't store any "flat" or "noisy" annotations in events.tsv
  • eye blinks and ECG can simply be added to events.tsv – don't need to be marked as "bad"
  • muscle / movement artifacts – I'm not sure, maybe we should just leave those out for now

What do you think?

hoechenberger avatar Nov 30 '20 08:11 hoechenberger

How I understood it:

  • after recording, a user has converted their data to BIDS and it passes the validator
  • now they want to screen the raw data and annotate it
    • bad channels
    • bad segments
    • etc.
  • they open the MNE-BIDS inspector, which seamlessly reads their BIDS dataset ... and now the user can review data file after data file
  • The user can either:
    • make no setting ... then they get the raw data as is, and annotate everything manually
    • make some setting, for example "premark_eyeblinks"
  • if a setting was passed to the inspector, it uses MNE or any function to automatically annotate what the algorithm thinks are eyeblinks in the data
  • the subject STILL has to go through the data, and unselect false positives from the auto-annotation
  • finally, after all annotations are done (either manually, or semi-manually, or automatically in case a user did not go through the data to check for false positives from automatic marking), the data gets written:
    • bad segments go into events.tsv
    • bad channels go into channels.tsv

What we need to "fix" in a larger group with more stakeholders:

  1. how to annotate "bad segments" for subgroups of channels instead of "ALL" channels --> for now I'd say that we ignore this
  2. how to encode "bad segments" into events.tsv

For point 2, I would recommend a straight forward way.

--> we do not know yet, how BIDS will decide this should be done. So we just pick our own convention ... e.g., using consistent event descriptions for "muscle artifact" like "BAD_MUSCLE_SEGMENT" (or whatever). These then get written with their onset and duration to events.tsv

--> MNE-BIDS will be able to read these from events.tsv as well (but only the specific descriptions that we decide to use).

--> when BIDS in the future standardizes the descriptions to be used and they are different from what we decided on, we switch a couple of lines in our code, ... and continue to support our then "legacy" style


A potential issue may be that users do not screen their data after MNE's "auto" detection of bad chs and segments --> and then in the resulting dataset, the annotations may look as "manual", but are in fact automatic.

However, I am not sure how grave this issue is compared to no annotations at all. --> "good users" will also document this in their README. Or we can document it for them.

sappelhoff avatar Nov 30 '20 08:11 sappelhoff

A potential issue may be that users do not screen their data after MNE's "auto" detection of bad chs and segments --> and then in the resulting dataset, the annotations may look as "manual", but are in fact automatic.

The Inspector currently adds a status_description to channels that were interactively marked as bad:

https://github.com/mne-tools/mne-bids/blob/409dd36fa1ce422b30b599e26adb09c771818d8d/mne_bids/inspect.py#L273

We could do the same thing for automatically detected bads.

To my knowledge similar column exists for events.tsv, though.

hoechenberger avatar Nov 30 '20 09:11 hoechenberger

What we need to "fix" in a larger group with more stakeholders:

  1. how to annotate "bad segments" for subgroups of channels instead of "ALL" channels --> for now I'd say that we ignore this

Ignoring it meaning: simply doing the annotation (and therefore marking more data as "bad" than necessary), or not doing the annotation for now?

  1. how to encode "bad segments" into events.tsv

For point 2, I would recommend a straight forward way.

--> we do not know yet, how BIDS will decide this should be done. So we just pick our own convention ... e.g., using consistent event descriptions for "muscle artifact" like "BAD_MUSCLE_SEGMENT" (or whatever). These then get written with their onset and duration to events.tsv

--> MNE-BIDS will be able to read these from events.tsv as well (but only the specific descriptions that we decide to use).

--> when BIDS in the future standardizes the descriptions to be used and they are different from what we decided on, we switch a couple of lines in our code, ... and continue to support our then "legacy" style

Sounds like a good plan to me!

hoechenberger avatar Nov 30 '20 09:11 hoechenberger

Ignoring it meaning: simply doing the annotation (and therefore marking more data as "bad" than necessary), or not doing the annotation for now?

I'd go for doing, and marking more bad than strictly necessary.

In a perfect world we'd make specific channels in specific time intervals, and also be able to interpolate these time intervals and channels :man_shrugging:

sappelhoff avatar Nov 30 '20 09:11 sappelhoff

and also be able to interpolate these time intervals and channels 🤷‍♂️

I think this is what autoreject can do on an Epochs level. But yeah…

hoechenberger avatar Nov 30 '20 09:11 hoechenberger

I'd go for doing, and marking more bad than strictly necessary.

In this case we should definitely discuss adding the equivalent of channel.tsv's status_description column to events.tsv, so we can keep track of which channels were responsible for a specific "bad segment" annotation

hoechenberger avatar Nov 30 '20 09:11 hoechenberger

I think that to make a pragmatic decision here we need to ask ourselves what would I do should you have these annotations?

I know how to deal with bad channels -> I ignore them or interpolate them I know how to deal with a bad segment -> I ignore it

now if I have 12 different ways to define it as BAD will it change the way I process things?

if I have bad annotations that are channel specific what will you do? do local interpolations? how much is this actually being done? how common is this use case?

I would like to avoid over engineering. Let's first start with things that will make us one step forward and let's iterate when we feel limited.

Note that the convention for starting with BAD we use in MNE is taken from Brainstorm

agramfort avatar Nov 30 '20 21:11 agramfort

I agree in principle but I'm worried that different softwares will come up with different ways to solve this problem ad hoc (custom columns, different algorithms etc.) and we will lose standardization. Annotations written by EEGLAB algorithms cannot be read in MNE and vice-versa.

jasmainak avatar Dec 01 '20 05:12 jasmainak

Ok so let's just follow the basic approach @agramfort suggested: problematic channels will be marked as bad (and we'll add a comment to status_description), and problematic time segments will get a BAD_ entry in events.tsv.

And we should start a discussion over at bids-specification on how to standardize those annotations – we need to do that anyway to know how to deal with events starting with bad – think bad_emotion vs good_emotion experimental events – currently MNE/MNE-BIDS would simply remove those from the data!

Deal?

hoechenberger avatar Dec 01 '20 08:12 hoechenberger

deal

@jasmainak who are the people leading these efforts on the BIDS side?

agramfort avatar Dec 01 '20 08:12 agramfort

At the very minimum from an iEEG perspective, I would 200% use anything that helps me automate "flat" and "high-frequency" noise channels because it's super tedious for me to fire up EDFBrowser and run through raw files esp if I have >20 subjects. It sounds like this is going to happen forsure and the main discussion is around "bad channel time windows".

Thanks @jasmainak! Yes definitely do loop me in if you can :)

Would also like to be looped in if possible :).

In this case we should definitely discuss adding the equivalent of channel.tsv's status_description column to events.tsv, so we can keep track of which channels were responsible for a specific "bad segment" annotation

Right now in the specification, there are requirements to link columns of different files. For example, *electrodes.tsv and *channels.tsv need to have corresponding groups when assigning groups to different channels and they MUST match in both files. In the same way, this sounds like an extra OPTIONAL status_description column defined in the events.tsv file SHOULD/MUST correspond to a status_description in channels.tsv if it describes a bad channel epoch.

adam2392 avatar Dec 01 '20 16:12 adam2392

I came up with the first draft of BEP021 with Dora Hermes at the BIDS sprint in Aug 2017. However, at the time, the BIDS EEG and BIDS iEEG was not yet finalized, so we could not get feedback from the major players and move it forward. Now, I don't have much time to work on this as 1.5 years ago ... would be glad if someone wanted to take over. I believe Cyril Pernet wants to take the lead and Guio Nisomar and Arno were also interested. I just don't know who is going to put in the actual work :) @sappelhoff do you think this is an accurate summary of the situation?

jasmainak avatar Dec 01 '20 21:12 jasmainak

There was also a CZI grant submitted by Cyril last cycle, so I'd say get in touch with Cyril and Arno if you want to contribute to the effort

jasmainak avatar Dec 01 '20 21:12 jasmainak

yes, I think that's accurate. As for getting in touch, issues here: https://github.com/bids-standard/bep021/ or comments here https://bids.neuroimaging.io/bep021 would work as well.

sappelhoff avatar Dec 02 '20 08:12 sappelhoff

I don't really have the bandwidth now... :( :(

agramfort avatar Dec 02 '20 21:12 agramfort

Sounds like a job for me! 😁

hoechenberger avatar Dec 02 '20 21:12 hoechenberger