DiViMe
DiViMe copied to clipboard
basic format discussion (rttm)
Pro's rttm
- standard
- well defined
- validated over a very long time
- richest format
- we could use a single evaluation for all diarization tasks, including SAD
Con's rttm in divime
- poorly implemented right now (but this should be fixed anyway)
- we could use the convention but for the transcript part, we'd have all time stamps with asterisk
- not easy to read by target users (but this could be fixed by generating other formats)
- unnecessarily complete for several tasks (speech detection)
We discuss alternative formats:
- stm NIST (for transcriptions) file, channel, speaker, beg, (dur/end), category [male, far... properties], transcription
- ctm NIST (for phones)
- WCE: own format
- NOTE including more formats means more complexity
CONCLUSION:
- [ ] fix our use of rttm and make it standard for sad/vad, talker, and role diarization, and VCM --> all using same eval scripts
- [ ] for WCE, as well as input for these, we'll use stm http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/infmts.htm#stm_fmt_name_0
- we are still waiting to see how we eval WCE
Note: Check Coconut for conversion across formats
after further discussions, we decide VCM will also write its output to the "speaker ID" column, and thus use eval from evalDiar