FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

Can't predict timestamp, and speaker diarization relies on timestamps.

Open TaiYouWeb opened this issue 1 year ago • 2 comments

model = AutoModel(
    model="FunAudioLLM/SenseVoiceSmall",
    vad_model="fsmn-vad",
    punc_model="ct-punc", 
    spk_model="cam++",
    vad_kwargs={"max_single_segment_time": 15000},
    batch_size=1,
    hub="hf",
    device=device,
)

console error =>

ERROR:root:Only 'iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch'
                    and 'iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch'
                    can predict timestamp, and speaker diarization relies on timestamps.

TaiYouWeb avatar Oct 06 '24 11:10 TaiYouWeb

same error here

whmzsu avatar Jun 28 '25 01:06 whmzsu

@whmzsu @TaiYouWeb 遇到的错误是因为 SenseVoice 模型不支持时间戳预测功能,而说话人分离(speaker diarization)依赖于时间戳信息,但是这个时间戳应该可以支持吧,因为VAD检测有时间起始点阿?但是官方没有给出说法,自己写时间戳了

jinwater88 avatar Sep 11 '25 01:09 jinwater88