PaddleSpeech icon indicating copy to clipboard operation
PaddleSpeech copied to clipboard

[S2T] PaddleSpeech-Server-RESTful-API 不识别 pcm 格式,punc 参数不起作用

Open mzgcz opened this issue 1 year ago • 4 comments

For support and discussions, please use our Discourse forums.

If you've found a bug then please create an issue with the following information:

Describe the bug PaddleSpeech-Server描述说语音识别服务支持pcm和wav两种格式,但输入pcm格式文件时,报以下错误:

raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name)) soundfile.LibsndfileError: Error opening <_io.BytesIO object at 0x7f9bc43d1b30>: Format not recognised. [2023-12-01 15:54:05,375] [ ERROR] - can not open the audio file, please check the audio file(<_io.BytesIO object at 0x7f9bc43d1b30>) format is 'wav'. you can try to use sox to change the file format. For example: sample rate: 16k sox input_audio.xx --rate 16k --bits 16 --channels 1 output_audio.wav sample rate: 8k sox input_audio.xx --rate 8k --bits 16 --channels 1 output_audio.wav

[2023-12-01 15:54:05,375] [ ERROR] - file check failed!

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: [e.g. Ubuntu]
  • GCC/G++ Version [e.g. 8.3]
  • Python Version [e.g. 3.7]
  • PaddlePaddle Version [e.g. 2.0.0]
  • Model Version [e.g. 2.0.0]
  • GPU/DRIVER Informationo [e.g. Tesla V100-SXM2-32GB/440.64.00]
  • CUDA/CUDNN Version [e.g. cuda-10.2]
  • MKL Version
  • TensorRT Version

Additional context Add any other context about the problem here.

mzgcz avatar Dec 01 '23 08:12 mzgcz

传递字段audio_format { "audio": "exSI6ICJlbiIsCgkgICAgInBvc2l0aW9uIjogImZhbHNlIgoJf...", "audio_format": "pcm", "sample_rate": 16000, "lang": "zh_cn", "punc": 0 }

zxcd avatar Jan 02 '24 12:01 zxcd

wav和pcm格式传入punc参数都没用,没有补充标点符号

beixiang-l avatar Jan 08 '24 08:01 beixiang-l

传递字段audio_format { "audio": "exSI6ICJlbiIsCgkgICAgInBvc2l0aW9uIjogImZhbHNlIgoJf...", "audio_format": "pcm", "sample_rate": 16000, "lang": "zh_cn", "punc": 0 }

我确认下:"audio_format": "pcm"时,audio对应的是纯pcm载荷吧?因为如果携带的是wav载荷是没有问题的。而且我查看代码也只是对wav载荷处理,没有对纯pcm载荷的处理。

mzgcz avatar Jan 09 '24 03:01 mzgcz

我试了下,punc的参数在wav的情况下一样是无效的... data = { "audio": base64_string, "audio_format": "wav", "sample_rate": 32000, "lang": "zh_cn", "punc": True }

warkcod avatar Jun 13 '24 03:06 warkcod