Make Annotation.write_rttm follow most of RTTM specs
The method write_rttm() only allow the type SPEAKER in the first field of the RTTM File.
This pull request is for adding the type NON-SPEECH in field 1, if the label of the segment it's one of the 3 subtypes allowed in the RTTM File Format Specification (noise, music or other).
Thanks. Would you mind sharing a link to the RTTM file format specification?
Thank you for sharing and build this project. Of course, in this NIST's paper in the Appendix A you can find the RTTM File Format Specification.
(sorry for the delay in getting back to you)
It looks like RTTM files may contain much more than just SPEAKER and NON-SPEECH (column Type of Table A.2).
Also, there is no clear correspondance between pyannote.core.Annotation labels and RTTM type, subtype, and name fields.
Therefore, unless you convince me otherwise and we find a way to really map Annotation to the RTTM specs, I probably won't merge this PR.
Hi, @hbredin. Don't worry about the delay, and thanks for taking the time to answer back.
I'm with you. Maybe this request is too poor. I think pyannote.core.Annotation is very useful for VAD, SAD, and SPK-Diarization. If we figure out a way to map better with the RTTM specs, it could be equally useful for Acoustic Events Detection or Rich Transcription.
The thing for me is that if the method pyannote.core.Annotation.write_rttm only prints with the subtype SPEAKER I can't include acoustics events such as music in the annotation. Maybe a refactoring that covers all the specs will be better. What do you think?
I'd definitely consider a PR that covers all the specs (or at least STT and MDE categories).
RTTM specs vs. Annotation
There is not a 100% correspondance between RTTM specs and what Annotation can handle.
for segment, track, label in annotation.itertracks(yield_label=True):
pass
| RTTM | Annotation |
|---|---|
type |
see below |
file |
annotation.uri |
chnl |
see below |
tbeg |
segment.start |
tdur |
segment.duration |
ortho |
N/A |
stype |
see below |
name |
label when type is SPEAKER |
conf |
N/A |
N/A = information is not provided by Annotation
About type
While track is used to differentiate two identical segments (think: perfect overlap between two speakers), we could try to divert its use to provide a cue about what type it is (while still allowing to differentiate two identical segments). Note, however, thattrack is expected to be either a string or an int.
For instance, we could use track with the following convention {type}_{original_track} where type can be any type between LEXEME and SPEAKER (see column Type of Table A.2) and original_track allows to keep the original role of differentiating identical segments.
About subtype
Once we infer type from track,
- if
typeisA/PorSPEAKER,subtypeshould be"<NA>" - otherwise,
subtypeshould belabel.
About chnl
We could trick annotation.uri into containing channel information (e.g. using {file}:{chnl} convention)
What do you think?