FunASR
FunASR copied to clipboard
运行speech_diarization,分割出的时间戳大于音频时间本身长度
跑speaker_diarization任务的时候发现分割出的时间大于输入音频的最大长度
dinference_diar_pipline = pipeline(
mode="sond_demo",
num_workers=0,
task=Tasks.speaker_diarization,
diar_model_config="sond.yaml",
model='damo/speech_diarization_sond-zh-cn-alimeeting-16k-n16k4-pytorch',
model_revision="v1.0.5",
sv_model="damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch",
sv_model_revision="v1.2.2",
)
audio_list=[
"../2.wav",
"../spk1.wav",
"../spk2.wav",
"../spk3.wav",
"../spk4.wav",
]
results = inference_diar_pipline(audio_in=audio_list)
print(results)
{'text': 'spk1 [(0.0, 18.8), (55.36, 59.04), (68.16, 74.0), (93.92, 94.8), (95.6, 106.48), (152.88, 154.64), (158.16, 161.28)]\nspk2 [(18.8, 55.36), (58.88, 68.16), (74.0, 91.36), (94.8, 95.6), (106.48, 144.56), (154.64, 158.16)]\nspk3 [(91.28, 94.0)]\nspk4 [(125.84, 128.0), (130.96, 131.68), (144.56, 152.88)]'}
共117秒的音频最后分割却能有144.56, 152.88这样的结果
Please raise issues ref to https://github.com/alibaba-damo-academy/FunASR/issues/1073
@T0L0ve 请问这个问题解决了吗?我也有类似的问题