FunASR
FunASR copied to clipboard
Can't predict timestamp, and speaker diarization relies on timestamps.
model = AutoModel(
model="FunAudioLLM/SenseVoiceSmall",
vad_model="fsmn-vad",
punc_model="ct-punc",
spk_model="cam++",
vad_kwargs={"max_single_segment_time": 15000},
batch_size=1,
hub="hf",
device=device,
)
console error =>
ERROR:root:Only 'iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch'
and 'iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch'
can predict timestamp, and speaker diarization relies on timestamps.
same error here
@whmzsu @TaiYouWeb 遇到的错误是因为 SenseVoice 模型不支持时间戳预测功能,而说话人分离(speaker diarization)依赖于时间戳信息,但是这个时间戳应该可以支持吧,因为VAD检测有时间起始点阿?但是官方没有给出说法,自己写时间戳了