pyannote-core Make Annotation.write_rttm follow most of RTTM specs

The method write_rttm() only allow the type SPEAKER in the first field of the RTTM File.

This pull request is for adding the type NON-SPEECH in field 1, if the label of the segment it's one of the 3 subtypes allowed in the RTTM File Format Specification (noise, music or other).

Jan 05 '22 13:01 JMasr

Thanks. Would you mind sharing a link to the RTTM file format specification?

Jan 05 '22 13:01 hbredin

Thank you for sharing and build this project. Of course, in this NIST's paper in the Appendix A you can find the RTTM File Format Specification.

Jan 05 '22 14:01 JMasr

(sorry for the delay in getting back to you)

It looks like RTTM files may contain much more than just SPEAKER and NON-SPEECH (column Type of Table A.2). Also, there is no clear correspondance between pyannote.core.Annotation labels and RTTM type, subtype, and name fields.

Therefore, unless you convince me otherwise and we find a way to really map Annotation to the RTTM specs, I probably won't merge this PR.

Jan 18 '22 13:01 hbredin

Hi, @hbredin. Don't worry about the delay, and thanks for taking the time to answer back.

I'm with you. Maybe this request is too poor. I think pyannote.core.Annotation is very useful for VAD, SAD, and SPK-Diarization. If we figure out a way to map better with the RTTM specs, it could be equally useful for Acoustic Events Detection or Rich Transcription.

The thing for me is that if the method pyannote.core.Annotation.write_rttm only prints with the subtype SPEAKER I can't include acoustics events such as music in the annotation. Maybe a refactoring that covers all the specs will be better. What do you think?

Jan 18 '22 23:01 JMasr

I'd definitely consider a PR that covers all the specs (or at least STT and MDE categories).

RTTM specs vs. `Annotation`

There is not a 100% correspondance between RTTM specs and what Annotation can handle.

for segment, track, label in annotation.itertracks(yield_label=True):
    pass

RTTM	`Annotation`
`type`	see below
`file`	`annotation.uri`
`chnl`	see below
`tbeg`	`segment.start`
`tdur`	`segment.duration`
`ortho`	N/A
`stype`	see below
`name`	`label` when `type` is `SPEAKER`
`conf`	N/A

N/A = information is not provided by Annotation

About `type`

While track is used to differentiate two identical segments (think: perfect overlap between two speakers), we could try to divert its use to provide a cue about what type it is (while still allowing to differentiate two identical segments). Note, however, thattrack is expected to be either a string or an int.

For instance, we could use track with the following convention {type}_{original_track} where type can be any type between LEXEME and SPEAKER (see column Type of Table A.2) and original_track allows to keep the original role of differentiating identical segments.

About `subtype`

Once we infer type from track,

if type is A/P or SPEAKER, subtype should be "<NA>"
otherwise, subtype should be label.

About `chnl`

We could trick annotation.uri into containing channel information (e.g. using {file}:{chnl} convention)

What do you think?

Jan 19 '22 08:01 hbredin

Make Annotation.write_rttm follow most of RTTM specs

RTTM specs vs. Annotation

About type

About subtype

About chnl

RTTM specs vs. `Annotation`

About `type`

About `subtype`

About `chnl`