TTS icon indicating copy to clipboard operation
TTS copied to clipboard

[Bug] 'GPT2InferenceModel' object has no attribute 'generate'

Open ErimatOesteRP opened this issue 7 months ago • 5 comments

Describe the bug

uv run .\teste1.py Arquivo WAV: Sample rate=24000, Channels=1

tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded. Using model: xtts GPT2InferenceModel has generative capabilities, as prepare_inputs_for_generation is explicitly defined. However, it doesn't directly inherit from GenerationMixin. From 👉v4.50👈 onwards, PreTrainedModel will NOT inherit from GenerationMixin, and this model will lose the ability to call generate and other related functions.

  • If you're using trust_remote_code=True, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
  • If you are the owner of the model architecture code, please modify your model class such that it inherits from GenerationMixin (after PreTrainedModel, otherwise you'll get an exception).
  • If you are not the owner of the model architecture class, please contact the model code owner to update it.

Text splitted to sentences. ['esse é um teste de clonagem de voz'] Traceback (most recent call last): File "D:\bkp_hd\Projetos\python\TTS_02\teste1.py", line 22, in tts.tts_to_file(text=text, speaker_wav=wav_path, language="pt", file_path=output_path) File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\api.py", line 334, in tts_to_file wav = self.tts( ^^^^^^^^^ File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\api.py", line 276, in tts wav = self.synthesizer.tts( ^^^^^^^^^^^^^^^^^^^^^ File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\utils\synthesizer.py", line 386, in tts outputs = self.tts_model.synthesize( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\tts\models\xtts.py", line 419, in synthesize return self.full_inference(text, speaker_wav, language, **settings) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\tts\models\xtts.py", line 488, in full_inference return self.inference( ^^^^^^^^^^^^^^^ File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\tts\models\xtts.py", line 541, in inference gpt_codes = self.gpt.generate( ^^^^^^^^^^^^^^^^^^ File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\TTS\tts\layers\xtts\gpt.py", line 590, in generate gen = self.gpt_inference.generate( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\bkp_hd\Projetos\python\TTS_02.venv\Lib\site-packages\torch\nn\modules\module.py", line 1940, in getattr raise AttributeError( AttributeError: 'GPT2InferenceModel' object has no attribute 'generate'

Atualização do modelo para suportar transformers 4.50+

To Reproduce

rom TTS.api import TTS from TTS.tts.configs.xtts_config import XttsConfig from TTS.tts.models.xtts import XttsAudioConfig, XttsArgs from TTS.config.shared_configs import BaseDatasetConfig import torch import soundfile as sf

Adiciona os globals à lista de permitidos

torch.serialization.add_safe_globals([XttsConfig, XttsAudioConfig, BaseDatasetConfig, XttsArgs])

Verificar arquivo WAV de referência

wav_path = "D:/bkp_hd/Projetos/python/TTS_02/temp/referencia.wav" data, sample_rate = sf.read(wav_path) print(f"Arquivo WAV: Sample rate={sample_rate}, Channels={data.shape[1] if data.ndim > 1 else 1}")

Inicializar modelo TTS

tts = TTS(model_name="tts_models/multilingual/multi-dataset/xtts_v2", progress_bar=True, gpu=False)

Gerar áudio

text = "esse é um teste de clonagem de voz" output_path = "D:/bkp_hd/Projetos/python/TTS_02/temp/output.wav" tts.tts_to_file(text=text, speaker_wav=wav_path, language="pt", file_path=output_path)

print(f"Áudio gerado com sucesso: {output_path}")

Expected behavior

No response

Logs


Environment

uv pip show TTS transformers soundfile
Name: soundfile
Version: 0.13.1
Location: D:\bkp_hd\Projetos\python\TTS_02\.venv\Lib\site-packages
Requires: cffi, numpy
Required-by: librosa, trainer, tts
---
Name: transformers
Version: 4.52.1
Location: D:\bkp_hd\Projetos\python\TTS_02\.venv\Lib\site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: tts
---
Required-by: tts
---
Name: tts
Version: 0.22.0

Additional context

No response

ErimatOesteRP avatar May 22 '25 14:05 ErimatOesteRP

This is fixed in our fork (available via pip install coqui-tts). This repo is not maintained anymore.

eginhard avatar May 23 '25 10:05 eginhard

Thank you very much

ErimatOesteRP avatar May 23 '25 12:05 ErimatOesteRP

Downgrade the transformers. It works for me. pip install transformers==4.33.0

wapping avatar Jun 11 '25 09:06 wapping

I will downgrade, tank you very much, it worked now...

ErimatOesteRP avatar Jun 11 '25 12:06 ErimatOesteRP

I'd recommend to use the fork (pip install coqui-tts) instead. It supports newer transformer versions and has many other bugs fixed.

eginhard avatar Jun 11 '25 12:06 eginhard

看了项目的 requirements.txt 内容,这个问题出现的根本原因是 TTS 没有限制依赖项的版本,导致安装了最新版本的依赖,而TTS本身又不兼容新版本,最好的办法是限制每一个依赖项的版本,一劳永逸的解决问题。

ibaoger avatar Jun 19 '25 03:06 ibaoger

看了项目的 requirements.txt 内容,这个问题出现的根本原因是 TTS 没有限制依赖项的版本,导致安装了最新版本的依赖,而TTS本身又不兼容新版本,最好的办法是限制每一个依赖项的版本,一劳永逸的解决问题。

I translated this to English because it helped me alot to understand the issue - i used Chatgpt

Translation to English: 'After reviewing the project's requirements.txt, the root cause of the issue is that TTS does not specify version constraints for its dependencies. As a result, the latest versions of the dependencies are installed, which are not compatible with TTS itself. The best solution is to explicitly specify the version of each dependency to resolve the problem once and for all.'

aspirant2018 avatar Jun 20 '25 08:06 aspirant2018

This list of packages with the correpondant versions that helped me to run the inference of TTS correctly:

absl-py==2.3.0 aiohappyeyeballs==2.6.1 aiohttp==3.12.13 aiosignal==1.3.2 annotated-types==0.7.0 anyascii==0.3.2 attrs==25.3.0 audioread==3.0.1 babel==2.17.0 bangla==0.0.5 blinker==1.9.0 blis==1.2.1 bnnumerizer==0.0.2 bnunicodenormalizer==0.1.7 catalogue==2.0.10 certifi==2025.6.15 cffi==1.17.1 charset-normalizer==3.4.2 click==8.2.1 cloudpathlib==0.21.1 colorama==0.4.6 confection==0.1.5 contourpy==1.3.2 coqpit==0.0.17 cycler==0.12.1 cymem==2.0.11 Cython==3.1.2 dateparser==1.1.8 decorator==5.2.1 docopt==0.6.2 einops==0.8.1 encodec==0.1.1 filelock==3.18.0 Flask==3.1.1 fonttools==4.58.4 frozenlist==1.7.0 fsspec==2025.5.1 g2pkk==0.1.2 grpcio==1.73.0 gruut==2.2.3 gruut-ipa==0.13.0 gruut_lang_de==2.0.1 gruut_lang_en==2.0.1 gruut_lang_es==2.0.1 gruut_lang_fr==2.0.2 hangul-romanize==0.1.0 huggingface-hub==0.33.0 idna==3.10 inflect==7.5.0 itsdangerous==2.2.0 jamo==0.4.1 jieba==0.42.1 Jinja2==3.1.6 joblib==1.5.1 jsonlines==1.2.0 kiwisolver==1.4.8 langcodes==3.5.0 language_data==1.3.0 lazy_loader==0.4 librosa==0.11.0 llvmlite==0.44.0 marisa-trie==1.2.1 Markdown==3.8.2 markdown-it-py==3.0.0 MarkupSafe==3.0.2 matplotlib==3.10.3 mdurl==0.1.2 more-itertools==10.7.0 mpmath==1.3.0 msgpack==1.1.1 multidict==6.5.0 murmurhash==1.0.13 networkx==2.8.8 nltk==3.9.1 num2words==0.5.14 numba==0.61.2 numpy==1.26.4 packaging==25.0 pandas==1.5.3 pillow==11.0.0 platformdirs==4.3.8 pooch==1.8.2 preshed==3.0.10 propcache==0.3.2 protobuf==6.31.1 psutil==7.0.0 pycparser==2.22 pydantic==2.11.7 pydantic_core==2.33.2 Pygments==2.19.1 pynndescent==0.5.13 pyparsing==3.2.3 pypinyin==0.54.0 pysbd==0.3.4 python-crfsuite==0.9.11 python-dateutil==2.9.0.post0 pytz==2025.2 PyYAML==6.0.2 regex==2024.11.6 requests==2.32.4 rich==14.0.0 safetensors==0.5.3 scikit-learn==1.7.0 scipy==1.15.3 shellingham==1.5.4 six==1.17.0 smart-open==7.1.0 soundfile==0.13.1 soxr==0.5.0.post1 spacy==3.8.7 spacy-legacy==3.0.12 spacy-loggers==1.0.5 srsly==2.5.1 SudachiDict-core==20250515 SudachiPy==0.6.10 sympy==1.13.1 tensorboard==2.19.0 tensorboard-data-server==0.7.2 thinc==8.3.4 threadpoolctl==3.6.0 tokenizers==0.13.3 torch==2.5.1+cu121 torchaudio==2.5.1+cu121 torchvision==0.20.1+cu121 tqdm==4.67.1 trainer==0.0.36 transformers==4.33.0 TTS==0.22.0 typeguard==4.4.4 typer==0.16.0 typing-inspection==0.4.1 typing_extensions==4.14.0 tzdata==2025.2 tzlocal==5.3.1 umap-learn==0.5.7 Unidecode==1.4.0 urllib3==2.5.0 wasabi==1.1.3 weasel==0.4.1 Werkzeug==3.1.3 wrapt==1.17.2 yarl==1.20.1

aspirant2018 avatar Jun 20 '25 08:06 aspirant2018

@ibaoger @aspirant2018 As mentioned above, I'd recommend to just install the fork instead with pip install coqui-tts. It has been updated to work with recent versions of packages out of the box and includes many other bug fixes.

eginhard avatar Jun 20 '25 14:06 eginhard

@eginhard yes, i have seen your comment and your recommendation. I will use the forked package sooner or later. Thank you again

aspirant2018 avatar Jun 20 '25 14:06 aspirant2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

stale[bot] avatar Jul 20 '25 21:07 stale[bot]