ChatTTS icon indicating copy to clipboard operation
ChatTTS copied to clipboard

CONFUSING! TELlamaModel error and torchaudio.save runtime error

Open ckgithub2019 opened this issue 1 year ago • 5 comments

1. this issue happens every time. but the README said "Optional: Install TransformerEngine if using NVIDIA GPU (Linux only), The adaptation of TransformerEngine is currently under development and CANNOT run properly now", I'm using GPU ann Linux, so do I must install it or not? confusing:

use default LlamaModel for importing TELlamaModel error: No module named 'transformer_engine'

2. I tested with a very simple code snippet, but it didn't work, and runtime error here:

We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
text:   0%|▎                                                                                                  | 1/384(max) [00:00,  6.19it/sttext:  12%|███████████▌                                                                                     | 46/384(max) [00:00, 118.84it/s]
code:   1%|█                                                                                               | 22/2048(max) [00:00, 213.37it/sccode:  20%|██████████████████▊                                                                            | 406/2048(max) [00:01, 213.68it/s]
/home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torchaudio/_backend/ffmpeg.py:245: UserWarning: The use of `x.T` on tensors of dimension other than 2 to reverse their shape is deprecated and it will throw an error in a future release. Consider `x.mT` to transpose batches of matrices or `x.permute(*torch.arange(x.ndim - 1, -1, -1))` to reverse the dimensions of a tensor. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3697.)
  src = src.T
Traceback (most recent call last):
  File "/home/ck/ai_project/tts_server/basic_test.py", line 13, in <module>
    torchaudio.save(f"basic_output{i}.wav", torch.from_numpy(wavs[i]).unsqueeze(0), 24000)
  File "/home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torchaudio/_backend/utils.py", line 313, in save
    return backend.save(
           ^^^^^^^^^^^^^
  File "/home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torchaudio/_backend/ffmpeg.py", line 316, in save
    save_audio(
  File "/home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torchaudio/_backend/ffmpeg.py", line 257, in save_audio
    s.write_audio_chunk(0, src)
  File "/home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torio/io/_streaming_media_encoder.py", line 469, in write_audio_chunk
    self._s.write_audio_chunk(i, chunk, pts)
RuntimeError: Input Tensor has to be 2D.
Exception raised from validate_audio_input at /__w/audio/audio/pytorch/audio/src/libtorio/ffmpeg/stream_writer/tensor_converter.cpp:31 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7def216cbf86 in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x68 (0x7def2167add9 in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x57147 (0x7def20ee0147 in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torio/lib/libtorio_ffmpeg4.so)
frame #3: <unknown function> + 0x57afc (0x7def20ee0afc in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torio/lib/libtorio_ffmpeg4.so)
frame #4: torio::io::TensorConverter::convert(at::Tensor const&) + 0x33 (0x7def20ee2723 in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torio/lib/libtorio_ffmpeg4.so)
frame #5: torio::io::EncodeProcess::process(at::Tensor const&, std::optional<double> const&) + 0xbe (0x7def20ed15ee in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torio/lib/libtorio_ffmpeg4.so)
frame #6: torio::io::StreamingMediaEncoder::write_audio_chunk(int, at::Tensor const&, std::optional<double> const&) + 0xa5 (0x7def20edcd85 in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torio/lib/libtorio_ffmpeg4.so)
frame #7: <unknown function> + 0x3a306 (0x7dee63d09306 in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torio/lib/_torio_ffmpeg4.so)
frame #8: <unknown function> + 0x32bf7 (0x7dee63d01bf7 in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torio/lib/_torio_ffmpeg4.so)
frame #9: python() [0x528767]
<omitting python frames>
frame #12: python() [0x5cbeda]
frame #14: python() [0x5ec6a7]
frame #15: python() [0x5e8240]
frame #16: python() [0x5fd192]
frame #21: <unknown function> + 0x29d90 (0x7def73e29d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #22: __libc_start_main + 0x80 (0x7def73e29e40 in /lib/x86_64-linux-gnu/libc.so.6)
frame #23: python() [0x5bbac3]

The tested code here:

import ChatTTS
import torch
import torchaudio

chat = ChatTTS.Chat()
chat.load(compile=False) # Set to True for better performance

texts = ["chat T T S is a text to speech model designed for dialogue applications.", "[uv_break]it supports mixed language input [uv_break]"]

wavs = chat.infer(texts)

for i in range(len(wavs)):
    torchaudio.save(f"basic_output{i}.wav", torch.from_numpy(wavs[i]).unsqueeze(0), 24000)

ckgithub2019 avatar Aug 08 '24 13:08 ckgithub2019

I'm using GPU ann Linux, so do I must install it or not?

No, you should not install it.

I tested with a very simple code snippet, but it didn't work

It's a known problem, see #635

fumiama avatar Aug 09 '24 04:08 fumiama

I'm using GPU ann Linux, so do I must install it or not?

No, you should not install it.

I tested with a very simple code snippet, but it didn't work

It's a known problem, see #635

it works, thanks. so that means all of "unsqueeze(0)" should be removed by default? the example is wrong or there are some other usages about "unsqueeze(0)"?

From example: torchaudio.save(f"output_sentence_level_{i}.wav", torch.from_numpy(wavs[i]).unsqueeze(0), 24000) torchaudio.save(f"output_word_level_{i}.wav", torch.from_numpy(wavs[0]).unsqueeze(0), 24000)

ckgithub2019 avatar Aug 09 '24 10:08 ckgithub2019

it works, thanks. so that means all of "unsqueeze(0)" should be removed by default?

No. In fact, some version (usually newer versions) of torchaudio will panic if unsqueeze(0) does not exist.

fumiama avatar Aug 09 '24 11:08 fumiama

Tried with sound file instead, which works

import soundfile
soundfile.write(output_filename, wavs[0], 24000)

kevincobain2000 avatar Aug 30 '24 11:08 kevincobain2000

Tried with sound file instead, which works

import soundfile
soundfile.write(output_filename, wavs[0], 24000)

Yes. There're many alternatives to save audio.

fumiama avatar Aug 31 '24 15:08 fumiama

This issue was closed because it has been inactive for 15 days since being marked as stale.

github-actions[bot] avatar Nov 21 '24 04:11 github-actions[bot]