FunASR issues

How is the FP16 model trained?

Notice: In order to resolve issues more efficiently, please raise issue following the template. （注意：为了更加高效率解决您遇到的问题，请按照模板提问，补充细节） ## ❓How is the FP16 model trained? Can I save the FP16 model as a...

icestoneking

question

Fp16 inference, forward got NaN

4

Notice: In order to resolve issues more efficiently, please raise issue following the template. （注意：为了更加高效率解决您遇到的问题，请按照模板提问，补充细节） ## 🐛 Bug 使用官方示例代码推理，当使用Fp16的时候，前向出现Nan ### To Reproduce Steps to reproduce the behavior (**always include the...

lzl-mt

bug

现在FunAsr能否支持类似讯飞的动态修正的功能？

1

目前FunAsr对于稍微有点语音的识别上不太准。发现同样的文件用讯飞的动态修正可以正确识别。请问，我们FunASR有支持类似的功能吗？如果没有，有方案/计划实现这种方式吗？

yufeng1684

question

加载python中的websock非常慢，耗时很长

运行FunASR/runtime/python/websocket/funasr_wss_server.py时，至少要15多分钟才能加载完模型。参数没有修改。

dkzyh87

question

采样率问题

1

## ❓ Questions and Help 采用speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx这个系列的模型，是不是只能实时读取转写16k采样率的音频，我尝试实时转写电脑内部的音频流时，在开始的json文件包含"audio_fs": 48000，响应结果没有任何变化，依然是错误的；有没有可以识别其他采样率的方法（我知道上传wav文件可以根据文件信息重新采样，二进制实时转写没看到相关的内容），针对二进制字节流，实时转写的解决方案。

dd123-a

question

websocket 协议文档中，在 offline 模式下 is_final 字段是没有用处的

文档内容在 [websocket_protocol.md](https://github.com/modelscope/FunASR/blob/main/runtime/docs/websocket_protocol_zh.md) 中的如下板块（从外到内） - `离线文件转写` - `从服务端往客户端发数据` - `发送识别结果` - `参数介绍` 其中在`参数介绍`中标注了 `is_final` 字段功能为“表示识别结束”。这个字段在 offline 模式下永远返回 False，没有出现过 True。我在阅读一些客户端的代码之后，发现在 `offline` 模式下，客户端并不关心这个字段。这些代码通常的做法是在接收到一个 websocket package 之后就直接断开 ws 链接。建议：在文档中体现出来这一点，标注离线模式下的客户端正确做法。以下是个人改动的版本，仅供参考：...

Tryanks

documentation

funasr库获取的时间轴有问题

5

库版本: ``` funasr 1.1.4 modelscope 1.16.1 ``` 运行代码: ``` from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks if __name__ == '__main__': audio_in = 'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_speaker_demo.wav' output_dir = "./results" inference_pipeline = pipeline(...

Lixi20

bug

paraformer-en模型转译结果完全不对

2

## 🐛 Bug paraformer-en模型转译结果完全不对，中文模型没问题 ![image](https://github.com/modelscope/FunASR/assets/32347475/769c6a73-8367-439c-be2d-fdbed526b893) #### Code sample from funasr import AutoModel model_en = AutoModel(model="paraformer-en", disable_log=True, disable_pbar=True) model_en.generate(input="file.mp3", batch_size_s=300, is_final=True) ### Expected behavior fix 英文模型转译错误 ### Environment - Linux: -...

LuffyGT

bug

当处理长度为1个采样点的音频时，load_audio_text_image_video 函数存在bug

2

Notice: In order to resolve issues more efficiently, please raise issue following the template. （注意：为了更加高效率解决您遇到的问题，请按照模板提问，补充细节） ## 🐛 Bug 我在用 paraformer-zh-large-stream 模型对一批音频进行实时语音识别（流式），以下是我用的代码（按照modelscope上推荐的模板）： ``` # From https://www.modelscope.cn/models/iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online from funasr import AutoModel...

viewlei

bug

Why does the batch size significantly affect the recognition results in (iic/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020) and other models are not significantly affected by the batch size?paraformer-en长音频版对batch size（或者说padding操作）过于敏感，严重影响识别结果

Why does the batch size significantly affect the recognition results in this model and other models are not significantly affected by the batch size? 这个长音频版本的模型的识别结果对 batch size 特别敏感（主要是对 padding 操作很敏感），其它版本则没事...

283258771

bug

question

FunASR
FunASR copied to clipboard

Metadata

How is the FP16 model trained?

Fp16 inference, forward got NaN

现在FunAsr能否支持类似讯飞的动态修正的功能？

加载python中的websock非常慢，耗时很长

采样率问题

websocket 协议文档中，在 offline 模式下 is_final 字段是没有用处的

funasr库获取的时间轴有问题

paraformer-en模型转译结果完全不对

当处理长度为1个采样点的音频时，load_audio_text_image_video 函数存在bug

← Metadata

Owner

Metadata

FunASR FunASR copied to clipboard

Metadata

← Metadata

Owner

Metadata

FunASR
FunASR copied to clipboard