FunASR
FunASR copied to clipboard
针对麦克风的流式识别效果不好,总是有误识别
1.我将麦克风的流式音频转换成chunk输入模型,但是识别的效果比wav的本地音频要差很多,同时有很多误识别,请问这个是什么原因,加上一些vad会好吗? 2.还有,这个paraformer-zh是运行在gpu上面的吗?我看显存确实有相应增加?但是read me中似乎写着gpu未实现?
sd.default.device = 27 # ID为27号设备 1
model = AutoModel(model="paraformer-zh-streaming")
chunk_size = [0, 20, 5] encoder_chunk_look_back = 4 # number of chunks to lookback for encoder self-attention decoder_chunk_look_back = 1 # number of encoder chunks to lookback for decoder cross-attention
chunk_stride = chunk_size[1] * 960 # 600ms buffer = None # 麦克风数据缓存 cache = {} # FunASR缓存
stride_ratio=3
def callback(indata, frames, time, status): global buffer, cache if buffer is None: buffer = indata else: buffer = np.append(buffer, indata) # indata -> buffer # if len(buffer) < chunk_stride * 3: if len(buffer) < chunk_stride * stride_ratio: return # chunk format chunk = np.array([buffer[i] for i in range(0, chunk_stride * stride_ratio) if i % stride_ratio == 0]) res = model.generate( input=chunk, cache=cache, is_final=True, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back, ) print(res) buffer = buffer[chunk_stride * stride_ratio:] # 截取掉前面已经推理结束的buffer
with sd.InputStream(device=27, samplerate=16000, callback=callback): sd.sleep(600000) # 10分钟
What's your environment?
windows 10
- OS (e.g., Linux):
- FunASR Version (e.g., 1.0.0):
- ModelScope Version (e.g., 1.11.0):
- PyTorch Version (e.g., 2.0.0):
- How you installed funasr (
pip, source): - Python version:
- GPU (e.g., V100M32)
- CUDA/cuDNN version (e.g., cuda11.7):
- Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
- Any other relevant information: