fish-speech icon indicating copy to clipboard operation
fish-speech copied to clipboard

WebUI generated the audio,but can not play it.

Open allstable opened this issue 1 year ago • 3 comments

Self Checks

  • [X] This template is only for bug reports. For questions, please visit Discussions.
  • [X] I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English 中文 日本語 Portuguese (Brazil)
  • [X] I have searched for existing issues, including closed ones. Search issues
  • [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [X] Please do not modify this template and fill in all required fields.

Cloud or Self Hosted

Self Hosted (Source)

Environment Details

CentOS7, python=3.11, torch==2.4.1+cu124 torchvision==0.19.1+cu124 torchaudio==2.4.1+cu124,gradio=5.7.0

Steps to Reproduce

1.Run the WebUI

python tools/webui.py \
 --llama-checkpoint-path checkpoints/fish-speech-1.4 \
   --decoder-checkpoint-path checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth

2.Synthesize audio based on TEXT and Reference Audio on webUI

Prepare and configure the “Input Text”、“Reference Audio”, then click the “Generate”.

✔️ Expected Behavior

WebUI I can synthesize the audio and can play on this webpage.

❌ Actual Behavior

While in the WebUI I can synthesize the audio,but can not play, except I download the audio file to localhost.

I have read the 《 webui is failed to generate audio, no error reports on the backend log #610 》,but there is no answer.

allstable avatar Nov 28 '24 06:11 allstable

GPU type: NVIDIA A10

allstable avatar Nov 28 '24 06:11 allstable

The WebUI log:

2024-11-28 01:44:26.544 | INFO | tools.api:encode_reference:167 - Loaded audio with 5.42 seconds /myplc/.aigc/miniconda3/envs/fish-speech/lib/python3.11/site-packages/vector_quantize_pytorch/residual_fsq.py:170: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead. with autocast(enabled = False): 2024-11-28 01:44:26.802 | INFO | tools.api:encode_reference:175 - Encoded prompt: torch.Size([8, 117]) 2024-11-28 01:44:26.805 | INFO | tools.llama.generate:generate_long:759 - Encoded text: 这学校真不是一般的寒酸,统共只有一幢楼房,两层高,楼下是教室,楼上是办公室。 2024-11-28 01:44:26.806 | INFO | tools.llama.generate:generate_long:759 - Encoded text: 六间教室,一年级和二年级八个班的学生只能轮番上课,读到三年级就直接送到工厂里去实习,找不到实习单位就在家睡觉,搞得像山区小学一样。 2024-11-28 01:44:26.807 | INFO | tools.llama.generate:generate_long:759 - Encoded text: 该校没有操场,体育老师倒有三个。 2024-11-28 01:44:26.807 | INFO | tools.llama.generate:generate_long:777 - Generating sentence 1/3 of sample 1/1 0%| | 0/3914 [00:00<?, ?it/s]/myplc/.aigc/miniconda3/envs/fish-speech/lib/python3.11/contextlib.py:105: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature. self.gen = func(*args, **kwds) 5%|██████▎ | 177/3914 [00:08<03:05, 20.16it/s] 2024-11-28 01:44:35.657 | INFO | tools.llama.generate:generate_long:832 - Generated 179 tokens in 8.85 seconds, 20.23 tokens/sec 2024-11-28 01:44:35.658 | INFO | tools.llama.generate:generate_long:835 - Bandwidth achieved: 10.00 GB/s 2024-11-28 01:44:35.658 | INFO | tools.llama.generate:generate_long:840 - GPU Memory used: 1.38 GB 2024-11-28 01:44:35.658 | INFO | tools.llama.generate:generate_long:777 - Generating sentence 2/3 of sample 1/1 2024-11-28 01:44:35.661 | INFO | tools.api:decode_vq_tokens:189 - VQ features: torch.Size([8, 178]) 8%|██████████▉ | 287/3685 [00:14<02:53, 19.59it/s] 2024-11-28 01:44:50.460 | INFO | tools.llama.generate:generate_long:832 - Generated 289 tokens in 14.80 seconds, 19.53 tokens/sec 2024-11-28 01:44:50.460 | INFO | tools.llama.generate:generate_long:835 - Bandwidth achieved: 9.66 GB/s 2024-11-28 01:44:50.461 | INFO | tools.llama.generate:generate_long:840 - GPU Memory used: 1.54 GB 2024-11-28 01:44:50.461 | INFO | tools.llama.generate:generate_long:777 - Generating sentence 3/3 of sample 1/1 2024-11-28 01:44:50.464 | INFO | tools.api:decode_vq_tokens:189 - VQ features: torch.Size([8, 288]) 2%|██▊ | 68/3375 [00:03<02:48, 19.60it/s] 2024-11-28 01:44:54.086 | INFO | tools.llama.generate:generate_long:832 - Generated 70 tokens in 3.62 seconds, 19.32 tokens/sec 2024-11-28 01:44:54.086 | INFO | tools.llama.generate:generate_long:835 - Bandwidth achieved: 9.55 GB/s 2024-11-28 01:44:54.087 | INFO | tools.llama.generate:generate_long:840 - GPU Memory used: 1.64 GB 2024-11-28 01:44:54.088 | INFO | tools.api:decode_vq_tokens:189 - VQ features: torch.Size([8, 69]) /myplc/.aigc/miniconda3/envs/fish-speech/lib/python3.11/site-packages/gradio/processing_utils.py:738: UserWarning: Trying to convert audio automatically from float32 to 16-bit int format. warnings.warn(warning.format(data.dtype))

allstable avatar Nov 28 '24 06:11 allstable

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Dec 29 '24 00:12 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Jan 12 '25 00:01 github-actions[bot]