CONFUSING! TELlamaModel error and torchaudio.save runtime error
1. this issue happens every time. but the README said "Optional: Install TransformerEngine if using NVIDIA GPU (Linux only), The adaptation of TransformerEngine is currently under development and CANNOT run properly now", I'm using GPU ann Linux, so do I must install it or not? confusing:
use default LlamaModel for importing TELlamaModel error: No module named 'transformer_engine'
2. I tested with a very simple code snippet, but it didn't work, and runtime error here:
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
text: 0%|▎ | 1/384(max) [00:00, 6.19it/sttext: 12%|███████████▌ | 46/384(max) [00:00, 118.84it/s]
code: 1%|█ | 22/2048(max) [00:00, 213.37it/sccode: 20%|██████████████████▊ | 406/2048(max) [00:01, 213.68it/s]
/home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torchaudio/_backend/ffmpeg.py:245: UserWarning: The use of `x.T` on tensors of dimension other than 2 to reverse their shape is deprecated and it will throw an error in a future release. Consider `x.mT` to transpose batches of matrices or `x.permute(*torch.arange(x.ndim - 1, -1, -1))` to reverse the dimensions of a tensor. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3697.)
src = src.T
Traceback (most recent call last):
File "/home/ck/ai_project/tts_server/basic_test.py", line 13, in <module>
torchaudio.save(f"basic_output{i}.wav", torch.from_numpy(wavs[i]).unsqueeze(0), 24000)
File "/home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torchaudio/_backend/utils.py", line 313, in save
return backend.save(
^^^^^^^^^^^^^
File "/home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torchaudio/_backend/ffmpeg.py", line 316, in save
save_audio(
File "/home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torchaudio/_backend/ffmpeg.py", line 257, in save_audio
s.write_audio_chunk(0, src)
File "/home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torio/io/_streaming_media_encoder.py", line 469, in write_audio_chunk
self._s.write_audio_chunk(i, chunk, pts)
RuntimeError: Input Tensor has to be 2D.
Exception raised from validate_audio_input at /__w/audio/audio/pytorch/audio/src/libtorio/ffmpeg/stream_writer/tensor_converter.cpp:31 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7def216cbf86 in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x68 (0x7def2167add9 in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x57147 (0x7def20ee0147 in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torio/lib/libtorio_ffmpeg4.so)
frame #3: <unknown function> + 0x57afc (0x7def20ee0afc in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torio/lib/libtorio_ffmpeg4.so)
frame #4: torio::io::TensorConverter::convert(at::Tensor const&) + 0x33 (0x7def20ee2723 in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torio/lib/libtorio_ffmpeg4.so)
frame #5: torio::io::EncodeProcess::process(at::Tensor const&, std::optional<double> const&) + 0xbe (0x7def20ed15ee in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torio/lib/libtorio_ffmpeg4.so)
frame #6: torio::io::StreamingMediaEncoder::write_audio_chunk(int, at::Tensor const&, std::optional<double> const&) + 0xa5 (0x7def20edcd85 in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torio/lib/libtorio_ffmpeg4.so)
frame #7: <unknown function> + 0x3a306 (0x7dee63d09306 in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torio/lib/_torio_ffmpeg4.so)
frame #8: <unknown function> + 0x32bf7 (0x7dee63d01bf7 in /home/ck/anaconda3/envs/tts/lib/python3.11/site-packages/torio/lib/_torio_ffmpeg4.so)
frame #9: python() [0x528767]
<omitting python frames>
frame #12: python() [0x5cbeda]
frame #14: python() [0x5ec6a7]
frame #15: python() [0x5e8240]
frame #16: python() [0x5fd192]
frame #21: <unknown function> + 0x29d90 (0x7def73e29d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #22: __libc_start_main + 0x80 (0x7def73e29e40 in /lib/x86_64-linux-gnu/libc.so.6)
frame #23: python() [0x5bbac3]
The tested code here:
import ChatTTS
import torch
import torchaudio
chat = ChatTTS.Chat()
chat.load(compile=False) # Set to True for better performance
texts = ["chat T T S is a text to speech model designed for dialogue applications.", "[uv_break]it supports mixed language input [uv_break]"]
wavs = chat.infer(texts)
for i in range(len(wavs)):
torchaudio.save(f"basic_output{i}.wav", torch.from_numpy(wavs[i]).unsqueeze(0), 24000)
I'm using GPU ann Linux, so do I must install it or not?
No, you should not install it.
I tested with a very simple code snippet, but it didn't work
It's a known problem, see #635
I'm using GPU ann Linux, so do I must install it or not?
No, you should not install it.
I tested with a very simple code snippet, but it didn't work
It's a known problem, see #635
it works, thanks. so that means all of "unsqueeze(0)" should be removed by default? the example is wrong or there are some other usages about "unsqueeze(0)"?
From example: torchaudio.save(f"output_sentence_level_{i}.wav", torch.from_numpy(wavs[i]).unsqueeze(0), 24000) torchaudio.save(f"output_word_level_{i}.wav", torch.from_numpy(wavs[0]).unsqueeze(0), 24000)
it works, thanks. so that means all of "unsqueeze(0)" should be removed by default?
No. In fact, some version (usually newer versions) of torchaudio will panic if unsqueeze(0) does not exist.
Tried with sound file instead, which works
import soundfile
soundfile.write(output_filename, wavs[0], 24000)
Tried with sound file instead, which works
import soundfile soundfile.write(output_filename, wavs[0], 24000)
Yes. There're many alternatives to save audio.
This issue was closed because it has been inactive for 15 days since being marked as stale.