WebUI generated the audio,but can not play it.
Self Checks
- [X] This template is only for bug reports. For questions, please visit Discussions.
- [X] I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English 中文 日本語 Portuguese (Brazil)
- [X] I have searched for existing issues, including closed ones. Search issues
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [X] Please do not modify this template and fill in all required fields.
Cloud or Self Hosted
Self Hosted (Source)
Environment Details
CentOS7, python=3.11, torch==2.4.1+cu124 torchvision==0.19.1+cu124 torchaudio==2.4.1+cu124,gradio=5.7.0
Steps to Reproduce
1.Run the WebUI
python tools/webui.py \
--llama-checkpoint-path checkpoints/fish-speech-1.4 \
--decoder-checkpoint-path checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth
2.Synthesize audio based on TEXT and Reference Audio on webUI
Prepare and configure the “Input Text”、“Reference Audio”, then click the “Generate”.
✔️ Expected Behavior
WebUI I can synthesize the audio and can play on this webpage.
❌ Actual Behavior
While in the WebUI I can synthesize the audio,but can not play, except I download the audio file to localhost.
I have read the 《 webui is failed to generate audio, no error reports on the backend log #610 》,but there is no answer.
GPU type: NVIDIA A10
The WebUI log:
2024-11-28 01:44:26.544 | INFO | tools.api:encode_reference:167 - Loaded audio with 5.42 seconds
/myplc/.aigc/miniconda3/envs/fish-speech/lib/python3.11/site-packages/vector_quantize_pytorch/residual_fsq.py:170: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
with autocast(enabled = False):
2024-11-28 01:44:26.802 | INFO | tools.api:encode_reference:175 - Encoded prompt: torch.Size([8, 117])
2024-11-28 01:44:26.805 | INFO | tools.llama.generate:generate_long:759 - Encoded text: 这学校真不是一般的寒酸,统共只有一幢楼房,两层高,楼下是教室,楼上是办公室。
2024-11-28 01:44:26.806 | INFO | tools.llama.generate:generate_long:759 - Encoded text: 六间教室,一年级和二年级八个班的学生只能轮番上课,读到三年级就直接送到工厂里去实习,找不到实习单位就在家睡觉,搞得像山区小学一样。
2024-11-28 01:44:26.807 | INFO | tools.llama.generate:generate_long:759 - Encoded text: 该校没有操场,体育老师倒有三个。
2024-11-28 01:44:26.807 | INFO | tools.llama.generate:generate_long:777 - Generating sentence 1/3 of sample 1/1
0%| | 0/3914 [00:00<?, ?it/s]/myplc/.aigc/miniconda3/envs/fish-speech/lib/python3.11/contextlib.py:105: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.
self.gen = func(*args, **kwds)
5%|██████▎ | 177/3914 [00:08<03:05, 20.16it/s]
2024-11-28 01:44:35.657 | INFO | tools.llama.generate:generate_long:832 - Generated 179 tokens in 8.85 seconds, 20.23 tokens/sec
2024-11-28 01:44:35.658 | INFO | tools.llama.generate:generate_long:835 - Bandwidth achieved: 10.00 GB/s
2024-11-28 01:44:35.658 | INFO | tools.llama.generate:generate_long:840 - GPU Memory used: 1.38 GB
2024-11-28 01:44:35.658 | INFO | tools.llama.generate:generate_long:777 - Generating sentence 2/3 of sample 1/1
2024-11-28 01:44:35.661 | INFO | tools.api:decode_vq_tokens:189 - VQ features: torch.Size([8, 178])
8%|██████████▉ | 287/3685 [00:14<02:53, 19.59it/s]
2024-11-28 01:44:50.460 | INFO | tools.llama.generate:generate_long:832 - Generated 289 tokens in 14.80 seconds, 19.53 tokens/sec
2024-11-28 01:44:50.460 | INFO | tools.llama.generate:generate_long:835 - Bandwidth achieved: 9.66 GB/s
2024-11-28 01:44:50.461 | INFO | tools.llama.generate:generate_long:840 - GPU Memory used: 1.54 GB
2024-11-28 01:44:50.461 | INFO | tools.llama.generate:generate_long:777 - Generating sentence 3/3 of sample 1/1
2024-11-28 01:44:50.464 | INFO | tools.api:decode_vq_tokens:189 - VQ features: torch.Size([8, 288])
2%|██▊ | 68/3375 [00:03<02:48, 19.60it/s]
2024-11-28 01:44:54.086 | INFO | tools.llama.generate:generate_long:832 - Generated 70 tokens in 3.62 seconds, 19.32 tokens/sec
2024-11-28 01:44:54.086 | INFO | tools.llama.generate:generate_long:835 - Bandwidth achieved: 9.55 GB/s
2024-11-28 01:44:54.087 | INFO | tools.llama.generate:generate_long:840 - GPU Memory used: 1.64 GB
2024-11-28 01:44:54.088 | INFO | tools.api:decode_vq_tokens:189 - VQ features: torch.Size([8, 69])
/myplc/.aigc/miniconda3/envs/fish-speech/lib/python3.11/site-packages/gradio/processing_utils.py:738: UserWarning: Trying to convert audio automatically from float32 to 16-bit int format.
warnings.warn(warning.format(data.dtype))
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.