mne-python icon indicating copy to clipboard operation
mne-python copied to clipboard

Add reader for annotations in MIT format

Open drammock opened this issue 1 year ago • 3 comments

Describe the new feature or enhancement

The CHB-MIT Scalp EEG Database (physionet) has annotation files that MNE can't currently read. It's been asked about at least twice on our forum (one, two) and @Teuniz helpfully explained there what the annotation format is, which means it is possible for us to write a reader for the format.

Describe your proposed implementation

a new private func _read_annot_mit() or so, and accompanying logic in the existing mne.read_annotations() function to triage to the new private func.

Describe possible alternatives

A separate public function for reading just this type of annotation. Why might that be preferable? The existing mne.read_annotations() triages based on file extension. In the dataset linked above the file extension is .seizure, but the annotation format was designed for ECG so it's likely that there are files in this format with .ecg extensions out there too, and possibly other extensions as well. This makes it hard to know what file extension(s) to use to identify this format, so a separate reader would be more flexible.

Another possibility is adding a new parameter to the existing reader func (mit_format=False or so). If this switch were True, we triage to the new private func regardless of what the file extension is.

My preference is to overload the existing reader function, and (for now) only triage based on .seizure extension (since it's the only one we know our users want supported). We can always later add to the list of file extensions that map to this format, if users ask us to.

Additional context

Here's the text of the reference spec.

 Each annotation occupies an even number of bytes. The first byte in each pair is the least significant byte. The six most significant bits (A) of each byte pair are the annotation type code, and the ten remaining bits (I) specify the time of the annotation, measured in sample intervals from the previous annotation (or from the beginning of the record for the first annotation). If 0 < A <= ACMAX, then A is defined in <ecg/ecgcodes.h>. Several other possibilities exist:

A = SKIP [59.]
    I = 0; the next four bytes are the interval in PDP-11 long integer format (the high 16 bits first, then the low 16 bits, with the low byte first in each pair). 
A = NUM [60.]
    I = annotation num field for current and subsequent annotations; otherwise, assume previous annotation num (initially 0). 
A = SUB [61.]
    I = annotation subtyp field for current annotation only; otherwise, assume subtyp = 0. 
A = CHN [62.]
    I = annotation chan field for current and subsequent annotations; otherwise, assume previous chan (initially 0). 
A = AUX [63.]
    I = number of bytes of auxiliary information (which is contained in the next I bytes); an extra null, not included in the byte count, is appended if I is odd. 
A = I = 0: End of file. 

Copied from the wayback machine because the original page was timing out for me on the day I opened this issue.

drammock avatar Jun 13 '24 22:06 drammock

My preference is to overload the existing reader function, and (for now) only triage based on .seizure extension (since it's the only one we know our users want supported). We can always later add to the list of file extensions that map to this format, if users ask us to.

Agreed and rather than mit_format I'd rather have fmt="auto" that you could set to "mit" (or any of the other supported formats).

larsoner avatar Jun 14 '24 19:06 larsoner

Agreed and rather than mit_format I'd rather have fmt="auto" that you could set to "mit" (or any of the other supported formats).

If we triage based on file extension than IMO the extra param isn't necessary. But if in future triaging based on file extension becomes impractical then I agree fmt="auto" | "mit" is better than what I suggested.

drammock avatar Jun 14 '24 19:06 drammock

Wow this would've helped my life a long time ago :p. I always thought those were junk files! Perhaps some ppl in my old lab are interested.

adam2392 avatar Jun 27 '24 20:06 adam2392

Hello All! It looks like there is already a Python library that can read these annotations: https://wfdb.readthedocs.io/en/latest/wfdb.html#wfdb-annotations It matches the annotations on PhysioNet image image

Maybe it can be incorporated into MNE? Alternatively, I don't mind implementing it myself without using the package.

withmywoessner avatar Dec 11 '24 21:12 withmywoessner