pyannote-core icon indicating copy to clipboard operation
pyannote-core copied to clipboard

Make Annotation.write_rttm follow most of RTTM specs

Open JMasr opened this issue 3 years ago • 5 comments

The method write_rttm() only allow the type SPEAKER in the first field of the RTTM File.

This pull request is for adding the type NON-SPEECH in field 1, if the label of the segment it's one of the 3 subtypes allowed in the RTTM File Format Specification (noise, music or other).

JMasr avatar Jan 05 '22 13:01 JMasr

Thanks. Would you mind sharing a link to the RTTM file format specification?

hbredin avatar Jan 05 '22 13:01 hbredin

Thank you for sharing and build this project. Of course, in this NIST's paper in the Appendix A you can find the RTTM File Format Specification.

JMasr avatar Jan 05 '22 14:01 JMasr

(sorry for the delay in getting back to you)

It looks like RTTM files may contain much more than just SPEAKER and NON-SPEECH (column Type of Table A.2). Also, there is no clear correspondance between pyannote.core.Annotation labels and RTTM type, subtype, and name fields.

Therefore, unless you convince me otherwise and we find a way to really map Annotation to the RTTM specs, I probably won't merge this PR.

hbredin avatar Jan 18 '22 13:01 hbredin

Hi, @hbredin. Don't worry about the delay, and thanks for taking the time to answer back.

I'm with you. Maybe this request is too poor. I think pyannote.core.Annotation is very useful for VAD, SAD, and SPK-Diarization. If we figure out a way to map better with the RTTM specs, it could be equally useful for Acoustic Events Detection or Rich Transcription.

The thing for me is that if the method pyannote.core.Annotation.write_rttm only prints with the subtype SPEAKER I can't include acoustics events such as music in the annotation. Maybe a refactoring that covers all the specs will be better. What do you think?

JMasr avatar Jan 18 '22 23:01 JMasr

I'd definitely consider a PR that covers all the specs (or at least STT and MDE categories).

RTTM specs vs. Annotation

There is not a 100% correspondance between RTTM specs and what Annotation can handle.

for segment, track, label in annotation.itertracks(yield_label=True):
    pass
RTTM Annotation
type see below
file annotation.uri
chnl see below
tbeg segment.start
tdur segment.duration
ortho N/A
stype see below
name label when type is SPEAKER
conf N/A

N/A = information is not provided by Annotation

About type

While track is used to differentiate two identical segments (think: perfect overlap between two speakers), we could try to divert its use to provide a cue about what type it is (while still allowing to differentiate two identical segments). Note, however, thattrack is expected to be either a string or an int.

For instance, we could use track with the following convention {type}_{original_track} where type can be any type between LEXEME and SPEAKER (see column Type of Table A.2) and original_track allows to keep the original role of differentiating identical segments.

About subtype

Once we infer type from track,

  • if type is A/P or SPEAKER, subtype should be "<NA>"
  • otherwise, subtype should be label.

About chnl

We could trick annotation.uri into containing channel information (e.g. using {file}:{chnl} convention)

What do you think?

hbredin avatar Jan 19 '22 08:01 hbredin