kaldi make_rttm.py issue

make_rttm.py issue

Open dsmiller opened this issue 6 years ago • 16 comments

Here's a MWE of a problem in callhome_diarization/v1/local/make_rttm.py. The problem is when it tries to merge overlapping segments: it cannot deal with certain cases where a segment is a subsegment of another (e.g., if a speaker briefly speaks early on within a longer speaker segments). Specifically, given two consecutive utterances u1 and u2, if (u1.end + u2.begin)/2 > u2.end, there will be a negative time mark in the resulting RTTM.

$ cd $KALDI/egs/callhome_diarization/v1
$ echo -e "utt1 reco1 0 10\nutt2 reco1 3 5" > tmp.segments
$ echo -e "utt1 0\nutt2 1" > tmp.labels
$ cat tmp.segments 
utt1 reco1 0 10
utt2 reco1 3 5
$ cat tmp.labels 
utt1 0
utt2 1
$ ./diarization/make_rttm.py tmp.segments tmp.labels tmp.rttm
$ cat tmp.rttm 
SPEAKER reco1 0   0.000   6.500 <NA> <NA> 0 <NA> <NA>
SPEAKER reco1 0   6.500  -1.500 <NA> <NA> 1 <NA> <NA>

Mar 09 '18 14:03 dsmiller

The script egs/wsj/s5/steps/segmentation/convert_utt2spk_and_segments_to_rttm.py might be applicable to your case. You can use SCTK's rttmSmooth.pl -s 0 to merge nearby same-speaker segments if needed.

Mar 12 '18 23:03 vimalmanohar

@mmaciej2 you might want to look into this at some point.

Apr 03 '18 14:04 david-ryan-snyder

@mmaciej2 when you have a chance, could you think about what we should do with this issue?

Apr 26 '18 22:04 david-ryan-snyder

@dsmiller It seems to me that the issue here is that you are misunderstanding the usage of this script. I will clarify what this particular make_rttm.py script is for. In a sense, it is to remove "fuzzy" speaker change boundaries.

What this script is for is to produce an rttm file from a sliding-window diarization system. More specifically, the way it is "handling overlap" is to place hard speaker boundaries where we detect a speaker change (i.e. two adjacent segments have different speaker labels). But, due to using a sliding window, there will be overlap between the adjacent segments, which comes about not because of any detected overlapping speech, but just as an artifact from the sliding-window method.

A reason the script fails in the case you described is because it is somewhat nonsensical setup for this script's purpose. The script is designed to produce output that contains no overlapping speech, and it is unclear what the correct way to handle a segment being entirely contained within another segment would be.

Apr 27 '18 19:04 mmaciej2

I ran into this problem while creating a diarization test set. I had multiple single-channel files which, each one side of a conversation (like much of LDC's data). So to create a diarization test set I mixed the audio channels back together, and used VAD on the individual channels to create reference labels (some of which overlap or are proper subsegments). So the labels needed to be dropped or altered.

I solved the problem by dropping segments that were proper subsegments. But I assume other people will find themselves in similar situations, it may be useful to have a robust script for this purpose.

Apr 27 '18 19:04 dsmiller

I'm not entirely sure what it is you are trying to do. I have created diarization test sets by mixing individual channels in the past, and created the "ground truth" rttm file with some very basic text processing. I did not do any special processing—it was essentially just concatenating the different channel label references together and converting it into the rttm file format. Is there some kind of segment processing you want to do?

Apr 27 '18 19:04 mmaciej2

Hi @mmaciej2, I run into this issue after I used extract_xvectors.sh

The resulted segments have overlapping parts.

I think it is because of the subsegments have overlapping parts. They were created by get_uniform_subsegments.py cmd. See https://github.com/kaldi-asr/kaldi/blob/master/egs/callhome_diarization/v1/diarization/nnet3/xvector/extract_xvectors.sh#L108

The segments were copied from subsegments_data at the end of extract_xvectors.sh script https://github.com/kaldi-asr/kaldi/blob/master/egs/callhome_diarization/v1/diarization/nnet3/xvector/extract_xvectors.sh#L144

Oct 16 '18 18:10 oplatek

@oplatek,

The output of extract_xvectors.sh should produce overlapping segments. The extract_xvectors.sh script (and more specifically the get_uniform_subsegments.py script) take in non-overlapping segmentation (i.e. from speech activity detection) and produces overlapping subsegments.

The make_rttm.py script will break if it is given a segment that is entirely contained within another segment. This should not be possible with the output of extract_xvectors.sh unless the input to that script is incorrect. It is very important that the input segmentation to extract_xvectors.sh reflects true speech activity detection segmentation rather than ASR segmentation, where you can have overlapping segments due to multiple people speaking. But in general we should not have that information for diarization, since that is part of the diarization task.

Oct 16 '18 19:10 mmaciej2

@mmaciej2,

Would it be possible to add a test for this bad input in make_rttm.py? You could print out an error explaining why it's failing. I think a few people have run into this issue now.

Oct 16 '18 19:10 david-ryan-snyder

@mmaciej2 I double checked that I segments on the input does not overlap.

I do not understand what other requirements is needed. Can you pls explain in more detail?

I checked it by snippet where:

a name is name of recording,
(xs, xe) are start and end times of segment x,
(ys, ye) are start and end times of segment y
dsesgs is a dictionary of list of all segments per recording

In [35]: for name, lst in dsegs.items():
    ...:     for xs, xe in lst:
    ...:         for ys, ye in lst:
    ...:             if  (xs < ys and ys < xe) or (xs < ye and ye < xe):
    ...:                 print('overlap for ', name, xs, xe, ys, ye)

Oct 17 '18 00:10 oplatek

@oplatek,

As far as I know, if there is no overlap in the input, it shouldn't be producing incorrect output.

Can you share some of the problematic output and the corresponding input that created it?

Oct 17 '18 00:10 mmaciej2

@mmaciej2 thank you for the help I updated the script to validate the input.

It pointed me to the fact that I have a lot of consecutive segments. For example: (26.12, 26.66) and (26.66, 27.56) (35.04, 35.52) and (35.52, 36.53)


In [47]: for name, lst in dsegs.items():
    ...:     for i, (xs, xe) in enumerate(lst):
    ...:         for j, (ys, ye) in enumerate(lst):
    ...:             if ((xs <= ys and ys <= xe) or (xs <= ye and ye <= xe)) and i != j:
    ...:                 print('overlap for ', name, xs, xe, ys, ye, i, j)

...
('overlap for ', 'test21wav', 26.12, 26.66, 26.66, 27.56, 4, 5)
('overlap for ', 'test21.wav', 26.66, 27.56, 26.12, 26.66, 5, 4)
('overlap for ', 'test21.wav', 35.04, 35.52, 35.52, 36.53, 7, 8)
('overlap for ', 'test21.wav', 35.52, 36.53, 35.04, 35.52, 8, 7)

Should it be consecutive segments considered a valid input or not?

Oct 17 '18 00:10 oplatek

FYI: My problem was I was trying to represent multiple speakers in utt2spk which lead me to have utterances prefixed by spkID. As a consequence rttm files were not sorted according to timestamps which is expected by make_rttm. Every single time the timestamps were not in order the negative duration emerged.

Oct 24 '18 05:10 oplatek

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Jun 19 '20 07:06 stale[bot]

I need help, I want to convert .lab to rttm i am not able to run make_rttm.py

May 10 '21 07:05 Sangramsingkayte

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.

Jul 09 '21 09:07 stale[bot]

kaldi kaldi copied to clipboard

make_rttm.py issue

kaldi
kaldi copied to clipboard