FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

针对麦克风的流式识别效果不好,总是有误识别

Open Joseph513shen opened this issue 6 months ago • 8 comments
trafficstars

1.我将麦克风的流式音频转换成chunk输入模型,但是识别的效果比wav的本地音频要差很多,同时有很多误识别,请问这个是什么原因,加上一些vad会好吗? 2.还有,这个paraformer-zh是运行在gpu上面的吗?我看显存确实有相应增加?但是read me中似乎写着gpu未实现?

sd.default.device = 27 # ID为27号设备 1

model = AutoModel(model="paraformer-zh-streaming")

chunk_size = [0, 20, 5] encoder_chunk_look_back = 4 # number of chunks to lookback for encoder self-attention decoder_chunk_look_back = 1 # number of encoder chunks to lookback for decoder cross-attention

chunk_stride = chunk_size[1] * 960 # 600ms buffer = None # 麦克风数据缓存 cache = {} # FunASR缓存

stride_ratio=3

def callback(indata, frames, time, status): global buffer, cache if buffer is None: buffer = indata else: buffer = np.append(buffer, indata) # indata -> buffer # if len(buffer) < chunk_stride * 3: if len(buffer) < chunk_stride * stride_ratio: return # chunk format chunk = np.array([buffer[i] for i in range(0, chunk_stride * stride_ratio) if i % stride_ratio == 0]) res = model.generate( input=chunk, cache=cache, is_final=True, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back, ) print(res) buffer = buffer[chunk_stride * stride_ratio:] # 截取掉前面已经推理结束的buffer

with sd.InputStream(device=27, samplerate=16000, callback=callback): sd.sleep(600000) # 10分钟

What's your environment?

windows 10

  • OS (e.g., Linux):
  • FunASR Version (e.g., 1.0.0):
  • ModelScope Version (e.g., 1.11.0):
  • PyTorch Version (e.g., 2.0.0):
  • How you installed funasr (pip, source):
  • Python version:
  • GPU (e.g., V100M32)
  • CUDA/cuDNN version (e.g., cuda11.7):
  • Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
  • Any other relevant information:

Joseph513shen avatar May 22 '25 01:05 Joseph513shen