DiViMe icon indicating copy to clipboard operation
DiViMe copied to clipboard

basic format discussion (rttm)

Open alecristia opened this issue 7 years ago • 1 comments

Pro's rttm

  • standard
  • well defined
  • validated over a very long time
  • richest format
  • we could use a single evaluation for all diarization tasks, including SAD

Con's rttm in divime

  • poorly implemented right now (but this should be fixed anyway)
  • we could use the convention but for the transcript part, we'd have all time stamps with asterisk
  • not easy to read by target users (but this could be fixed by generating other formats)
  • unnecessarily complete for several tasks (speech detection)

We discuss alternative formats:

  • stm NIST (for transcriptions) file, channel, speaker, beg, (dur/end), category [male, far... properties], transcription
  • ctm NIST (for phones)
  • WCE: own format
  • NOTE including more formats means more complexity

CONCLUSION:

  • [ ] fix our use of rttm and make it standard for sad/vad, talker, and role diarization, and VCM --> all using same eval scripts
  • [ ] for WCE, as well as input for these, we'll use stm http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/infmts.htm#stm_fmt_name_0
  • we are still waiting to see how we eval WCE

Note: Check Coconut for conversion across formats

alecristia avatar Nov 29 '18 15:11 alecristia

after further discussions, we decide VCM will also write its output to the "speaker ID" column, and thus use eval from evalDiar

alecristia avatar Nov 29 '18 19:11 alecristia