FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

针对麦克风的流式识别效果不好,总是有误识别

Open Joseph513shen opened this issue 7 months ago • 8 comments

1.我将麦克风的流式音频转换成chunk输入模型,但是识别的效果比wav的本地音频要差很多,同时有很多误识别,请问这个是什么原因,加上一些vad会好吗? 2.还有,这个paraformer-zh是运行在gpu上面的吗?我看显存确实有相应增加?但是read me中似乎写着gpu未实现?

sd.default.device = 27 # ID为27号设备 1

model = AutoModel(model="paraformer-zh-streaming")

chunk_size = [0, 20, 5] encoder_chunk_look_back = 4 # number of chunks to lookback for encoder self-attention decoder_chunk_look_back = 1 # number of encoder chunks to lookback for decoder cross-attention

chunk_stride = chunk_size[1] * 960 # 600ms buffer = None # 麦克风数据缓存 cache = {} # FunASR缓存

stride_ratio=3

def callback(indata, frames, time, status): global buffer, cache if buffer is None: buffer = indata else: buffer = np.append(buffer, indata) # indata -> buffer # if len(buffer) < chunk_stride * 3: if len(buffer) < chunk_stride * stride_ratio: return # chunk format chunk = np.array([buffer[i] for i in range(0, chunk_stride * stride_ratio) if i % stride_ratio == 0]) res = model.generate( input=chunk, cache=cache, is_final=True, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back, ) print(res) buffer = buffer[chunk_stride * stride_ratio:] # 截取掉前面已经推理结束的buffer

with sd.InputStream(device=27, samplerate=16000, callback=callback): sd.sleep(600000) # 10分钟

What's your environment?

windows 10

  • OS (e.g., Linux):
  • FunASR Version (e.g., 1.0.0):
  • ModelScope Version (e.g., 1.11.0):
  • PyTorch Version (e.g., 2.0.0):
  • How you installed funasr (pip, source):
  • Python version:
  • GPU (e.g., V100M32)
  • CUDA/cuDNN version (e.g., cuda11.7):
  • Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
  • Any other relevant information:

Joseph513shen avatar May 22 '25 01:05 Joseph513shen

但是将同样的一段话,保存为本地wav音频,送入模型中识别,效果就可以,但我认为麦克风的实时音频流,和wav,本质上都是ndarray,有什么区别吗?sample_rate?

Joseph513shen avatar May 22 '25 03:05 Joseph513shen

Image

Joseph513shen avatar May 22 '25 03:05 Joseph513shen

要看麦克风的收音效果,可以保存下来听一下、、wav的本地音频是由麦克风数据转来的吗?

Spadger-dev avatar May 22 '25 08:05 Spadger-dev

请问有改进方法吗,我也用麦克风流式识别,错误率很高

sandsc avatar May 27 '25 09:05 sandsc

+1

hhm152800 avatar May 28 '25 01:05 hhm152800

+1

QiushiStaff avatar Jun 04 '25 02:06 QiushiStaff

+1

biao-lvwan avatar Jul 03 '25 10:07 biao-lvwan

我使用本地音频,chunk600ms,识别错误率也很高。

zhishao avatar Oct 11 '25 03:10 zhishao