whisperX
whisperX copied to clipboard
assign_word_speakers fix
Hello! Seems like here is mistake. We should look for closest instead of sum.
Simple counterexample:
predicted_sample - [0,1]
diarization_segments:
SPEAKER_00: [2, 3], [4, 5]
SPEAKER_01: [3, 4]
For SPEAKER_00 value of dia_tmp.groupby("speaker")["intersection"].sum().sort_values(ascending=False).index[0] will be sum([-1, -3]) = -4.
For SPEAKER_01 value will be sum([-2]) = -2.
(unless I made a mistake in my calculations in my head)
And therefore we will choose SPEAKER_01, even if it is not the closest
@m-bain could you have a look at this please?