yfq issues

Results 41 issues of

yfq

感谢大神分享，问一下参数 audio（发音人参考音频）是什么意思

参数 speaker 为发声人，和audio（参考音频）有啥区别么？是语音克隆么？最好能举个带这两个参数的例子

在作者的基础上解决了一些bug

感谢作者的分享，但我发现识别某些图片会报错（已解决），从网上爬了些图片，识别不够准确，我打算更换识别模型，感谢作者！！！

利用docker镜像服务把识别结果保存到本地（不需要编译GPU）

docker镜像服务的方法怎么用代码的方式上传图像，并将识别结果保存到本地

How set query conditions for "milvus_search"

### Is there an existing issue for this? - [X] I have searched the existing issues. ### Is your feature request related to a problem? Please describe. yes ### Describe...

kind/feature

使用python API方式通过增加和修改参数可以准确识别长图： ``` from paddleocr import PaddleOCR, draw_ocr ocr = PaddleOCR(use_angle_cls=False, lang="ch", det_limit_type='min', det_limit_side_len=100, use_gpu=True) result = ocr.ocr('test.jpg') ``` 但是现在使用[PPOCR 服务化部署](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/deploy/pdserving/README_CN.md)去部署和测试，就不能准确识别长图，此时要怎么修改配置参数使之也能适配长图的识别呢？

为什么预测同一张图片，分数不同？

使用http://ai.midday.me 鉴别yellow图功能检测出来的分数，和手动运行python nsfw_predict.py + 路径得到的分数不一样，有时甚至差很多，以至于分类不同。是模型不一样么？感觉线上的检测效果会更好。如果模型不一样，能不能提供一下线上的模型呀？感谢！

作者能更新一下进展么？或者知道怎么下载涉政，涉恐的数据么

作者可以分享一下你的环境么

长音频识别，基于ModelScope进行推理报错

参照demo * [Paraformer语音识别-中文-通用-16k-离线-large-长音频版](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) 环境 * 官方的modelscope gpu 版镜像（可以运行模型库中其他demo）代码（和例子一样） ``` from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks inference_pipeline = pipeline( task=Tasks.auto_speech_recognition, model='damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch', vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch', vad_model_revision="v1.1.8", punc_model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch', punc_model_revision="v1.1.6") rec_result = inference_pipeline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_vad_punc_example.wav')...

语音合成时如何手动添加停顿标识

``` from modelscope.outputs import OutputKeys from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks model_id = 'damo/speech_sambert-hifigan_tts_zh-cn_16k' sambert_hifigan_tts = pipeline(task=Tasks.text_to_speech, model=model_id) text1 = "你好呀，赛利亚" text2 = "我很好" output1 = sambert_hifigan_tts(input=text1, voice=“zhitian_emo”)...