FunASR
FunASR copied to clipboard
如何使用定长vad的onnx模型进行流式推理?
1、变长模型推理:使用模型输入为(1,feats_length,400)的模型。流式推理,step=5120(320ms,16k采样率)。
step = 5120 # 步长,5120 = 0.32s*16000,时长(s)*采样率
param_dict = {"in_cache": []}
vad_segments = [] # 存储VAD检测结果
for sample_offset in range(0, speech_length, step):
end = min(sample_offset + step, speech_length)
if end == speech_length:
is_final = True
else:
is_final = False
param_dict["is_final"] = is_final
segments_result = model(
audio_in=speech[sample_offset: end], param_dict=param_dict
)
if segments_result:
# print(segments_result)
vad_segments.append(segments_result[0][0])
推理结果为:origin vad segments: [[720, -1], [-1, 2100], [2800, -1], [-1, 4300], [4820, -1], [-1, 6190], [6900, -1], [-1, 8270], [12600, -1], [-1, 13970], [14680, -1], [-1, 16090], [19810, -1], [-1, 21130], [22030, -1], [-1, 23290], [23800, -1], [-1, 25230], [26020, -1], [-1, 27300], [28190, -1], [-1, 29440], [30310, -1], [-1, 31610], [32410, -1], [-1, 33660], [34340, -1], [-1, 35730], [36460, -1], [-1, 37770], [38560, -1], [-1, 39850], [40460, -1], [-1, 41830], [42250, -1], [-1, 43660], [44250, -1], [-1, 45460]] 可视化验证正确
2、固定模型推理:导出固定维度模型,输入为(1,32,400)的模型。流式推理,step=5120(320ms,16k采样率)。 代码和1中保持一致。 推理结果为:origin vad segments: [[440, -1], [-1, 1820], [2520, -1], [-1, 4020], [4540, -1], [-1, 5910], [6620, -1], [-1, 7990], [12320, -1], [-1, 13690], [14400, -1], [-1, 15810], [19530, -1], [-1, 20850], [21750, -1], [-1, 23010], [23520, -1], [-1, 24950], [25740, -1], [-1, 27020], [27910, -1], [-1, 29160], [30030, -1], [-1, 31330], [32130, -1], [-1, 33380], [34060, -1], [-1, 35450], [36180, -1], [-1, 37490], [38280, -1], [-1, 39570], [40180, -1], [-1, 41550], [41970, -1], [-1, 43380], [43970, -1]] 可视化结果发现每个人声段都前移 打印出每一次推理时的输入维度: waveforms shape: (1, 5120) feats shape: (1, 28, 400) onnx input feats shape: ((1, 28, 400), (1, 128, 19, 1)) waveforms shape: (1, 5120) feats shape: (1, 32, 400) onnx input feats shape: ((1, 32, 400), (1, 128, 19, 1)) waveforms shape: (1, 5120) feats shape: (1, 32, 400) onnx input feats shape: ((1, 32, 400), (1, 128, 19, 1)) .... waveforms shape: (1, 634) feats shape: (1, 6, 400) onnx input feats shape: ((1, 6, 400), (1, 128, 19, 1)) 是否因为第一个窗口fbank计算出的维度与后续不一致导致的问题? 应该如何进行固定长度的onnx模型推理?
麻烦帮忙看看这个问题,谢谢!