PaddleSpeech icon indicating copy to clipboard operation
PaddleSpeech copied to clipboard

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...

Results 289 PaddleSpeech issues
Sort by recently updated
recently updated
newest added

基于PaddleSpeech/examples/other/tts_finetune /tts3/的readme,在中英混合模型上,如果从BZNSYP中选出来3k条语音微调am模型,loss可以下降到0.7左右,且用微调模型合成语音,声音比较清晰,同样用aishell3的数据集的某个人的声音的多条数据微调,推理模型合成的声音也很清晰,没有沙沙的声音; 但是用上述方法,在thchs30上选了250个同一个人的语音进行微调,微调后推理模型合成的语音存在沙沙的声音,又从thchs30中选出1000条同一个人的音色的数据微调,微调后loss仍然在1.5左右,且推理合成的声音中存在沙沙的声音,但是能学到微调数据中的音色。 请问大佬们,上述是哪里出现了问题呢

Bug
T2S

根据`examples/wenetspeech/asr1/README.md`说明,对`asr1_chunk_conformer_u2pp_wenetspeech_ckpt_1.3.0.model.tar.gz`中的模型进行转化。 步骤如下: 1.进入转化脚本所在目录:`cd examples/wenetspeech/asr1/` 2.解压下载的模型文件压缩包至当前目录。 3.运行转化脚本:`./local/export.sh asr1_chunk_conformer_u2pp_wenetspeech_ckpt_1.3.0.model/model.yaml asr1_chunk_conformer_u2pp_wenetspeech_ckpt_1.3.0.model/exp/chunk_conformer_u2pp/checkpoints/avg_10 ./export.ji` 报错如下: ``` Traceback (most recent call last): File "/home/PaddleSpeech-r1.4/paddlespeech/s2t/exps/u2/bin//export.py", line 53, in main(config, args) File "/home/PaddleSpeech-r1.4/paddlespeech/s2t/exps/u2/bin//export.py", line 30, in main main_sp(config,...

Bug
S2T

采用fastspeech2_aishell3和pwgan_aishell3合成音频,指定spk_id后仍然出现多个人声音,而且有些字读不清晰,请问是什么原因呢? 代码如下: source path.sh FLAGS_allocator_strategy=naive_best_fit \ FLAGS_fraction_of_gpu_memory_to_use=0.01 \ python3 ${BIN_DIR}/../synthesize_e2e.py \ --am fastspeech2_aishell3 \ --am_config fastspeech2_aishell3_ckpt_1.1.0/default.yaml \ --am_ckpt fastspeech2_aishell3_ckpt_1.1.0/snapshot_iter_96400.pdz \ --am_stat fastspeech2_aishell3_ckpt_1.1.0/speech_stats.npy \ --voc pwgan_aishell3 \ --voc_config pwg_aishell3_ckpt_0.5/default.yaml \ --voc_ckpt...

Question

利用paddlespeech预训练模型进行声音分类模型调优 https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/tutorial/cls/cls_tutorial.ipynb 1、发现同样的测试样本,特征提取也一样,但多次预测,每次预测的结果对应的分类类型都不一样 a、有尝试过固定随机种子,问题没解决 b、尝试导出静态模型进行推理,但predict时提示找不出lod_audio包(脚本参考https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/esc50/cls0/local/infer.sh) 问题情况:paddleaudio 1.01 load_audio包找不到,帮忙确认一下哪个版本有,谢谢 from paddleaudio.backends import soundfile_load as load_audio报错 ImportError Traceback (most recent call last) Cell In[12], line 21 19 import yaml 20 from paddle.audio.features...

Question

默认采样率、采样位数、通道、刷新间隔是多少?

Question

audio_len should be 1D instead of 0D, which will raise list index out of range error in the following decode process ### PR types ### PR changes ### Describe

CLI
contributor

### PR types Others ### PR changes Others ### Describe Fixes a typo in README.md in demo

README
Demo
contributor

这是我的application.yaml protocol: 'http' engine_list: ['asr_python', 'text_python'] asr_python: model: 'conformer_wenetspeech' lang: 'zh' sample_rate: 16000 cfg_path: # [optional] ckpt_path: # [optional] decode_method: 'attention_rescoring' force_yes: True device: # set 'gpu:id' or 'cpu' text_python:...

Question

RuntimeError: (NotFound) The kernel with key (GPU, Undefined(AnyLayout), int16) of kernel `pad3d` is not registered and fail to fallback to CPU one. Selected wrong DataType `int16`. Paddle support following DataTypes:...

Bug
S2T