FunASR Can't predict timestamp, and speaker diarization relies on timestamps.

model = AutoModel(
    model="FunAudioLLM/SenseVoiceSmall",
    vad_model="fsmn-vad",
    punc_model="ct-punc", 
    spk_model="cam++",
    vad_kwargs={"max_single_segment_time": 15000},
    batch_size=1,
    hub="hf",
    device=device,
)

console error =>

ERROR:root:Only 'iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch'
                    and 'iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch'
                    can predict timestamp, and speaker diarization relies on timestamps.

Oct 06 '24 11:10 TaiYouWeb

same error here

Jun 28 '25 01:06 whmzsu

@whmzsu @TaiYouWeb 遇到的错误是因为 SenseVoice 模型不支持时间戳预测功能，而说话人分离（speaker diarization）依赖于时间戳信息,但是这个时间戳应该可以支持吧，因为VAD检测有时间起始点阿？但是官方没有给出说法，自己写时间戳了

Sep 11 '25 01:09 jinwater88