BUG: Fix annotation rejection idx
Reference issue (if any)
Addresses #12893
What does this implement/fix?
As discussed in the issue above, I implemented a fix that implements the last sample of "bad" annotations in raw.get_data being rejected now as well.
However, I noticed that some other functions, for example filter, have a similar behavior as get_data before, where the last sample of the "bad" annotation is not rejected. This means that for consistent behavior, we should also adapt these functions accordingly, right?
I'm happy about any other feedback and comments.
Changes
- Keep bad annotations that end at the start sample (annotations that are potentially one sample long): end = start
- Do not discard bad annotations that are exactly one sample long: onset = end
- Also discard last sample of bad annotation: end - start + 1
Thanks for the PR!
However, I noticed that some other functions, for example filter, have a similar behavior as get_data before, where the last sample of the "bad" annotation is not rejected. This means that for consistent behavior, we should also adapt these functions accordingly, right?
Indeed, but that's where this private helper function comes in:
onsets, ends = _annotations_starts_stops(self, ["BAD"])
I suspect you will find _annotations_starts_stops used throughout the entire code base to select the samples to retain or not. You should modify this function to correctly include the last sample instead of modifying each individual function which uses _annotations_starts_stops to find the ends indexes.
Thank you for the helpful comment!
So, I have been thinking about the function _annotations_starts_stops a little bit. I think we just need to get the desired behavior straight first, and then I am happy to implement everything. I realized that some people (also contributors/maintainers) may have a different mental model of what the start and stop indices means. Per se, the function _annotations_starts_stops does what it is supposed to: If I have a Raw object with a sampling rate of 10 Hz and have a "Bad" annotation from 0 to 10 s, it returns onsets = [0] and end = [100]:
sfreq = 10
info = mne.create_info(1, sfreq, "eeg")
raw = mne.io.RawArray(np.random.RandomState(0).randn(1, 30 * sfreq), info)
annotations = Annotations([0], [10], ['BAD'], raw.info['meas_date'])
raw.set_annotations(annotations)
starts, stops = _annotations_starts_stops(raw, 'BAD')
print(starts, stops)
100 is in fact the last sample of the bad segment, so if I do raw.get_data()[..., ends], I can get the last sample of the annotation. However, if I do slicing, the last sample is not included (as per convention in python): raw.get_data()[..., [onsets[0], ends[0].
Thus how the start and stop indices are used depends on what the user intends to do.
In brief, I had implemented the change in _annotations_starts_stops, and some of the tests break, for example as observed in the function test_annotation_filtering. With the changes, filtering Raw objects that have been concatenated does not work as intended anymore - the "EDGE" annotation (a single sample at the start of each concatenated segment) that was added during concatenation is not filtered at all anymore.
The question is if we still want to go ahead with the change in _annotations_starts_stops, or make the individual changes to the functions where the change is desired, like in get_data(). Looking forward to hearing your opinion.
The question is if we still want to go ahead with the change in _annotations_starts_stops, or make the individual changes to the functions where the change is desired, like in get_data(). Looking forward to hearing your opinion.
How about a new kwarg _annotation_starts_stops(..., include_last=False) so that our existing tests don't break and you can set include_last=True in the places you need to for this new functionality? Hopefully we don't need to make this public but if we need to we can.
FWIW we have for example raw.crop(..., include_tmax=True) for a similar issue when cropping. So maybe we want to call it include_tmax here if we do need to make it public...
Thanks, that sounds good. I will go ahead and implement it with the new kwarg!
Okay so after implementing _annotation_starts_stops(..., include_last=False), I am running into conflicts between different parts of the API:
Because now get_data uses include_last=True whereas filter does not, we get different results when first using get_data while rejecting bad segments and then filter, as compared to calling filter on the Raw object directly while rejecting bad segments via skip_by_annotation in filter, as observed by this test failing. I could not set include_last=True in filter, because of my abovementioned observation in the case of 'edge' annotations:
With the changes, filtering Raw objects that have been concatenated does not work as intended anymore - the "EDGE" annotation (a single sample at the start of each concatenated segment) that was added during concatenation is not filtered at all anymore.
I think the underlying problem is that 'edge' annotations are a special case of annotations. If I understand correctly, they are 1-sample long annotations that are set to mark the beginning/end of a segment when different segments have been concatenated to a single data block. So the 'edge' annotations do not mark bad data, in contrast to for example the annotation of bad acquisitions by 'bad_acq_skip'. But the default parameter skip_by_annotation=('edge', 'bad_acq_skip') treats 'edge' as if it were also a bad segment. So whether we have annotated a bad segment, or the edge of a segment, filter calls_annotation_starts_stops in exactly the same way, and relies on its current default behavior.
I guess there are different potential ways forward from here (simply listed, not ordered by preference):
- Give the users the choice by making
include_tmaxpublic inget_dataorfilter, as suggested by @larsoner - Change the public API of
filterto introduce different behavior for bad segments and edge annotations, for example with a new keyword such assplit_by='edge'. - Handle annotations in
skip_by_annotationthat begin with 'edge' as special cases internally (withinfilter), and document this behavior in the docstring. This way the default behavior offilterwith 'edge' annotations is not changed for the users.
I'm glad about any feedback! Thanks
I prefer (3), IMO this problem seems to be a bug within our private code/functions; and there is no reason to complicate the public API to tackle an internal bug.
Yeah I'm also okay with (3).
Regarding:
I think the underlying problem is that 'edge' annotations are a special case of annotations. If I understand correctly, they are 1-sample long annotations that are set to mark the beginning/end of a segment when different segments have been concatenated to a single data block.
Do we actually set duration=1 / sfreq for edge annotations? To me this seems like a bug... they should probably have duration 0. as they are essentially instantaneous occurrences. (This could make things ugly while plotting, but we could always fix that in the plotting routines.) If we changed our behavior to have duration=0 (with a shim to treat old files with duration 1 sample similarly), this would magically work better because neither the segment preceding nor the segment following the annotation would overlap with the actual annotation? Or magically it could make things worse :slightly_smiling_face: So maybe we shouldn't pursue this too much now, but conceptually it's a way to make things consistent in the future I think.
Do we actually set duration=1 / sfreq for edge annotations? To me this seems like a bug... they should probably have duration 0.
No, it is actually set to duration=0. The question is whether a zero-length annotation starting at a particular sample means that this sample should be discarded or not? I think this confusion captures the issue at hand quite well. Maybe this whole issue should be tackled first by defining the desired overall behavior, and then see if there is any contradicting behavior in the codebase that we actually need to change.
In the case of the 'edge' annotation it might be quite obvious that the annotation should not be discarded. But let's imagine we want to annotate a bad segment that goes from tmin=0s to tmax=1s (duration=1s). If we set this annotation, and then call get_data while rejecting bad data with reject_by_annotation, currently tmax of the bad segment would not be included (see code example below). We can see this by inspecting times[0] which returns 1.0. However, if we call raw.crop(tmin=1.0), the annotation is still included in the raw_crop object as a 0s-duration annotation. It seems counterintuitive to me, that the bad data is entirely discarded, but the annotation is still included in the cropped raw object, but I guess this is intended behavior for special cases (such as the 'edge' annotation), in order to conserve these types of annotations.
From what I have seen in the codebase is that when we have an annotation starting at 0s with a duration of for example 1s, only 0 to 0.99s (depending on the sfreq of course) are considered to be part of the annotation, and the sample at 1s is not. Maybe it is too much to modify this behavior now which seems to be more or less consistent within MNE. Maybe we should take a step back and think about how we can solve the original problem that sparked this PR initially, which was that if we annotated a bad segment at the end of the recording within the data browser, the last sample was not discarded. This is I think because from the visual annotation, the annotation in the raw object (lets take a recording of that with tmin=0 and tmax=1s) is calculated as e.g. tmin=0.5=onset, tmax=1 -> duration=1-0.5=0.5 -> samples from 0 to 0.99s are discarded, but sample at 1s i kept. I.e., there is no way in the data browser to include the last sample in the annotation, because our annotations cannot extend past the end of the recording. What could be a way to solve this? Always extend annotations from the data browser by 1 sample?
info = mne.create_info(1, sfreq, "eeg")
raw = mne.io.RawArray(
np.random.RandomState(0).randn(1, 30 * sfreq), info
)
annotations = Annotations([0], [1.0], ['BAD'], raw.info['meas_date'])
raw.set_annotations(annotations)
onsets, ends = _annotations_starts_stops(raw, 'BAD') # , include_last=False)
print(onsets, ends)
_, times = raw.get_data(return_times=True, reject_by_annotation='omit')
print(times[0])
raw_crop = raw.crop(tmin=1.0)
print(raw_crop.annotations[0])
No, it is actually set to duration=0. The question is whether a zero-length annotation starting at a particular sample means that this sample should be discarded or not?
I would say no. But also agree that converging on this view and modifying code etc. to abide by it would probably be far from trivial!
because our annotations cannot extend past the end of the recording. What could be a way to solve this? Always extend annotations from the data browser by 1 sample?
It seems like you should in principle be able to extend the annotation to the end of the time span represented by the sample. So if our model/mapping of the samples and continuous time is (and I think it is based on what we've discussed above) for example at 1000 Hz
samp 0 1 2 3
|-----------|-----------|-----------|-----------
time 0 0.001 0.002 0.003
i.e., sample 0 represents the time from 0 to 0.001, sample 1 the time from 0.001 to 0.002, etc., so sample 999 represents the time from 0.999 to 1.000, so our annotations should be allowed to go out to 1.000, not just 0.999. If this squares with our code, we should add to a comment in the code at least, and maybe also the Annotations docstring where we discuss all the first_samp business, too. And maybe cross-link to some discussion of this, like the "constant" vs "grid-constant" concepts in scipy.ndimage.
But I digress a bit... but it seems like this problem would be solved by:
- Allowing you to draw annotations to
len(raw.times) / sfreqinstead of only up to(len(raw.times) - 1) / sfreq(i.e., mne-qt-browser interaction PR) - Allowing
Annotationsto store such an annotation without cropping it (sounds like this might already be the case?)
Okay, then I think this is actually the easiest way forward:
Allowing you to draw annotations to len(raw.times) / sfreq instead of only up to (len(raw.times) - 1) / sfreq (i.e., mne-qt-browser interaction PR)
- I will try to prepare a PR to change this in mne-qt-browser
Allowing Annotations to store such an annotation without cropping it (sounds like this might already be the case?)
- Yes, I just checked and this is indeed the case - specifically,
set_annotationsallows you to store an annotation up to and including len(raw.times) / sfreq. Everything exceeding that will be cropped.
Thanks a ton for the discussion!
Okay, then I think this is actually the easiest way forward:
Allowing you to draw annotations to len(raw.times) / sfreq instead of only up to (len(raw.times) - 1) / sfreq (i.e., mne-qt-browser interaction PR)
I will try to prepare a PR to change this in mne-qt-browser
@richardkoehler still interested in making that PR changing the qt-browser behavior?
@drammock I'm super sorry, I was very busy the last weeks. But thanks for the reminder, I found some time this week and will make a pull to mne-qt-browser tonight.
@drammock I'm super sorry, I was very busy the last weeks. But thanks for the reminder, I found some time this week and will make a pull to mne-qt-browser tonight.
no worries! people get busy. was just asking so that if you're still swamped, someone else would know it's OK to pick it up without stepping on your toes :)
I think from the discussion above we're deciding to live with the mne-qt-browser PR only on this one and not change how MNE-Python itself handles things, so I'll close. But if I'm misreading let's reopen!
No, you are completely right, we decided on going with the mne-qt-browser PR. I should have closed this one but forgot, sorry!