FunASR
FunASR copied to clipboard
fsmn vad 如何精确定位句尾?
在我的音频中老师的提问和学生的回答时间间隔为1秒左右, 我减少max_end_silence_time为500ms尝试精确定位句尾,但是没有效果,无法精确分离老师和学生的话,请问还可以尝试什么配置呢? vad_model模型配置如下:
frontend: WavFrontendOnline
frontend_conf:
fs: 16000
window: hamming
n_mels: 80
frame_length: 25
frame_shift: 10
dither: 0.0
lfr_m: 5
lfr_n: 1
model: FsmnVADStreaming
model_conf:
sample_rate: 16000
detect_mode: 1
snr_mode: 0
max_end_silence_time: 500
max_start_silence_time: 3000
do_start_point_detection: True
do_end_point_detection: True
window_size_ms: 200
sil_to_speech_time_thres: 150
speech_to_sil_time_thres: 150
speech_2_noise_ratio: 1.0
do_extend: 1
lookback_time_start_point: 200
lookahead_time_end_point: 100
max_single_segment_time: 60000
snr_thres: -100.0
noise_frame_num_used_for_snr: 100
decibel_thres: -100.0
speech_noise_thres: 0.6
fe_prior_thres: 0.0001
silence_pdf_num: 1
sil_pdf_ids: [0]
speech_noise_thresh_low: -0.1
speech_noise_thresh_high: 0.3
output_frame_probs: False
frame_in_ms: 10
frame_length_ms: 25
encoder: FSMN
encoder_conf:
input_dim: 400
input_affine_dim: 140
fsmn_layers: 4
linear_dim: 250
proj_dim: 128
lorder: 20
rorder: 0
lstride: 1
rstride: 0
output_affine_dim: 140
output_dim: 248
代码如下:
import os
#os.environ["CUDA_VISIBLE_DEVICES"] = "-1,"
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
if __name__ == '__main__':
audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_speaker_demo.wav'
audio_in2 = '/home/STAna/stana/file/c28ab37273fc9a91b7722b963b320aff.wav'
output_dir = "./results"
inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model='iic/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn',
model_revision='v2.0.4',
vad_model='iic/speech_fsmn_vad_zh-cn-16k-common-pytorch', vad_model_revision="v2.0.4",
punc_model='iic/punc_ct-transformer_cn-en-common-vocab471067-large', punc_model_revision="v2.0.4",
spk_model="iic/speech_campplus_sv_zh-cn_16k-common",
spk_model_revision="v2.0.2",
output_dir=output_dir,
)
rec_result = inference_pipeline(audio_in2,batch_size_s=300, batch_size_token_threshold_s=40)
print(rec_result)
What have you tried?
What's your environment?
- OS (e.g., Linux):
- FunASR Version (e.g., 1.0.17):
- ModelScope Version (e.g., 1.13.0):
- PyTorch Version (e.g., 2.0.0):
- How you installed funasr (
pip, source): - Python version:
- GPU (e.g., V100M32)
- CUDA/cuDNN version (e.g., cuda11.7):
- Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
- Any other relevant information:
max_end_silence_time=100