piper No training possible on RTX 4090: CUFFT_INTERNAL_ERROR with torch

trafficstars

I encountered some problems with training, most of which I could resolve, as I will describe here. I tried it on WSL2 (Ubuntu-20.04) and a 'real' Linux Ubuntu-22.04LTS.

The WSL2 guide works well on Linux, also on WSL2, of course, with these additions:

You have to change torchmetrics like this: pip install torchmetrics==0.11.4 as Thorsten already mentioned in his video guide - Thanks Thorsten!

On WSL2, you may also encounter this error: "Error: WSL2 Could not load the library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory," which can be solved like this:

sudo ldconfig
cd /usr/lib/wsl/lib/
sudo mv libcuda.so.1 libcuda.so.1.backup
sudo mv libcuda.so libcuda.so.backup
sudo ln -s libcuda.so.1.1 libcuda.so.1
sudo ln -s libcuda.so.1.1 libcuda.so
sudo ldconfig

Also mentioned here github.com/microsoft/WSL/issues/5663

On my old system with a GTX1060 this is already working on GPU (on WSL2 and also native Ubuntu-22.04LTS) On the new system, I only get CPU to work. And of course the GTX1060 still beats a i9-14900k...

With the RTX 4090 it is like this (Same on WSL2 and Ubuntu.22.04LTS):

(.venv) user@ubuntu:~/piper/src/python$ python3 -m piper_train --dataset-dir ~/piper/my-training --accelerator 'gpu' --devices 1 --batch-size 32 --validation-split 0.0 --num-test-examples 0 --max_epochs 10000 --resume_from_checkpoint ~/piper/epoch=2665-step=1182078.ckpt --checkpoint-epochs 1 --precision 32 --quality high
DEBUG:piper_train:Namespace(dataset_dir='/home/user/piper/my-training', checkpoint_epochs=1, quality='high', resume_from_single_speaker_checkpoint=None, logger=True, enable_checkpointing=True, default_root_dir=None, gradient_clip_val=None, gradient_clip_algorithm=None, num_nodes=1, num_processes=None, devices='1', gpus=None, auto_select_gpus=False, tpu_cores=None, ipus=None, enable_progress_bar=True, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=None, max_epochs=10000, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, limit_train_batches=None, limit_val_batches=None, limit_test_batches=None, limit_predict_batches=None, val_check_interval=None, log_every_n_steps=50, accelerator='gpu', strategy=None, sync_batchnorm=False, precision=32, enable_model_summary=True, weights_save_path=None, num_sanity_val_steps=2, resume_from_checkpoint='/home/user/piper/epoch=2665-step=1182078.ckpt', profiler=None, benchmark=None, deterministic=None, reload_dataloaders_every_n_epochs=0, auto_lr_find=False, replace_sampler_ddp=True, detect_anomaly=False, auto_scale_batch_size=False, plugins=None, amp_backend='native', amp_level=None, move_metrics_to_cpu=False, multiple_trainloader_mode='max_size_cycle', batch_size=32, validation_split=0.0, num_test_examples=0, max_phoneme_ids=None, hidden_channels=192, inter_channels=192, filter_channels=768, n_layers=6, n_heads=2, seed=1234)
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py:52: LightningDeprecationWarning: Setting `Trainer(resume_from_checkpoint=)` is deprecated in v1.5 and will be removed in v1.7. Please pass `Trainer.fit(ckpt_path=)` directly instead.
  rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
DEBUG:piper_train:Checkpoints will be saved every 1 epoch(s)
DEBUG:vits.dataset:Loading dataset: /home/user/piper/my-training/dataset.jsonl
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:731: LightningDeprecationWarning: `trainer.resume_from_checkpoint` is deprecated in v1.5 and will be removed in v2.0. Specify the fit checkpoint path with `trainer.fit(ckpt_path=)` instead.
  ckpt_path = ckpt_path or self.resume_from_checkpoint
Missing logger folder: /home/user/piper/my-training/lightning_logs
Restoring states from the checkpoint path at /home/user/piper/epoch=2665-step=1182078.ckpt
DEBUG:fsspec.local:open file: /home/user/piper/epoch=2665-step=1182078.ckpt
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:345: UserWarning: The dirpath has changed from '/ssd/piper/out-train/lightning_logs/version_1/checkpoints' to '/home/user/piper/my-training/lightning_logs/version_0/checkpoints', therefore `best_model_score`, `kth_best_model_path`, `kth_value`, `last_model_path` and `best_k_models` won't be reloaded. Only `best_model_path` will be reloaded.
  warnings.warn(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
DEBUG:fsspec.local:open file: /home/user/piper/my-training/lightning_logs/version_0/hparams.yaml
Restored all states from the checkpoint file at /home/user/piper/epoch=2665-step=1182078.ckpt
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py:153: UserWarning: Total length of `DataLoader` across ranks is zero. Please make sure this was your intention.
  rank_zero_warn(
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:236: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 32 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1892: PossibleUserWarning: The number of training batches (6) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  rank_zero_warn(
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/user/piper/src/python/piper_train/__main__.py", line 147, in <module>
    main()
  File "/home/user/piper/src/python/piper_train/__main__.py", line 124, in main
    trainer.fit(model)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
    self._call_and_handle_interrupt(
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run
    results = self._run_stage()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage
    return self._run_train()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1283, in _run_train
    self.fit_loop.run()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 271, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 203, in advance
    batch_output = self.batch_loop.run(kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 87, in advance
    outputs = self.optimizer_loop.run(optimizers, kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 201, in advance
    result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 248, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 358, in _optimizer_step
    self.trainer._call_lightning_module_hook(
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1550, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1705, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 216, in optimizer_step
    return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 153, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
    return wrapped(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 140, in wrapper
    out = func(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/adamw.py", line 120, in step
    loss = closure()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 138, in _wrap_closure
    closure_result = closure()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 146, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 132, in closure
    step_output = self._step_fn()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 407, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1704, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 358, in training_step
    return self.model.training_step(*args, **kwargs)
  File "/home/user/piper/src/python/piper_train/vits/lightning.py", line 191, in training_step
    return self.training_step_g(batch)
  File "/home/user/piper/src/python/piper_train/vits/lightning.py", line 230, in training_step_g
    y_hat_mel = mel_spectrogram_torch(
  File "/home/user/piper/src/python/piper_train/vits/mel_processing.py", line 120, in mel_spectrogram_torch
    torch.stft(
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/functional.py", line 632, in stft
    return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

I did some research, and it seems this issue is caused by a bug in cuda-11.7, as mentioned here github.com/pytorch/pytorch/issues/88038. I also tried the nvidia/pytorch:22.03-py3 docker image, but that also has some support issues with the 4090?!

My question: Are there any workarounds to get an RTX 4090 running or any plans to upgrade to Torch >=2? It's a pity that I can't use it for training...

And also thanks for the great work!

Dec 03 '23 17:12 ei23fxg

I too would like to train with RTX 4090. I'd be interested in whether or not you were able to figure out a workaround. I'd be buying the 4090 specifically for this purpose, quite an investment if it doesn't work. If the RTX 4090 can't work, what's best GPU to get for training with Piper?

Dec 04 '23 21:12 aaronnewsome

Hi thank you very much for your great work !

here how i make to work with RTX 4090 and wls2 i use win 10

install developer python

sudo apt-get install python3-dev

Then create a Python virtual environment and activated:

cd piper/src/python
python3 -m venv .venv
source .venv/bin/activate

update pip and wheel setuptools

pip3 install --upgrade pip
pip3 install --upgrade wheel setuptools

install pytorch this version pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

change in requirements.txt

cython>=0.29.0,<1
librosa>=0.9.2,<1
piper-phonemize~=1.1.0
numpy>=1.19.0
onnxruntime>=1.11.0
pytorch-lightning~=1.9.0
onnx

run

pip3 install -e .

build build_monotonic_align

chmod +x build_monotonic_align.sh
build_monotonic_align.sh

i hope this help

Dec 04 '23 23:12 lpscr

This is great lpscr! I don't see anything there that's specific to running in a WSL environment, so it should work on an Ubuntu system. I'll go ahead and get an RTX 4090 and see if I can replicate what you've detailed above. Wish me luck!

Dec 04 '23 23:12 aaronnewsome

@aaronnewsome I can confirm @lpscr's workaround is succesfully woring on a 4090! And soo much faster than on GTX1060!!

Missing part for me was pytorch-lightning~=1.9.0 Thanks @lpscr

Never the less pre processing has some problems with that, but you can use the official installation method in another venv for that and be fine.

Edit: Set checkpoint_epochs=50 or 100 Will create less backups but training will be a lot faster I get ~1 epoch per sec (60 epochs per minute)!

Dec 05 '23 11:12 ei23fxg

60 epochs per minute!!

I'm getting about 20-30 epochs per HOUR with quality high, 1150 voice samples, using an RTX 3060 - 6GB. CPU only performance is not even worth mentioning, a waste of electricity if you ask me.

I'm placing my order for 4090 today!

Dec 05 '23 15:12 aaronnewsome

60 epochs per minute!!

I'm getting about 20-30 epochs per HOUR with quality high, 1150 voice samples, using an RTX 3060 - 6GB. CPU only performance is not even worth mentioning, a waste of electricity if you ask me.

I'm placing my order for 4090 today!

Writing every checkpoint epoch to disk is a bottleneck i think. My test was with 500 voice samples also on high quality, will record more samples and post results later. Never the less, the 4090 is a beast and totaly worth it. Got mine last month for around 1800€, but prices are rising currently

Dec 05 '23 23:12 ei23fxg

I just for fun tried the thorsten-voice dataset with 22672 voice samples: Preprocessing took quite some time (~20min) but this has to run on CPU. Results on main training with GPU looks a bit different here: one epoch ~2 minutes on RTX4090 (batch size 64) - but 22672 voice samples man... 1150 should be fast

Dec 06 '23 00:12 ei23fxg

I really appreciate you adding more context around the performance of the 4090 ei23fxg. Many, many thanks.

It makes me think there should be some kind of effort started to benchmark and catalog performance so that new users like me can understand what we're getting into with all this.

It could also be a great place for curious users to see which setups work, what kind of tweaks need to be done and so on. I'm really appreciative of this project and I find it just simply amazing. I'm rather impressed at myself for having the patience to actually get a training done, since I'm not an expert in any of these concepts. I feel like I've stumbled upon it way too early since it hasn't quite progressed to the "anyone can do it" stage.

I'd be willing to help organizing some kind of a benchmarking standard test. If everyone benchmarks the same samples, with the same software versions and settings, it could be very useful to collect those stats and make them browseable.

Dec 06 '23 01:12 aaronnewsome

Thanks for @lpscr. follow your work , I can train on 4060Ti.

Dec 06 '23 01:12 qt06

hi everyone @qt06 @aaronnewsome @ei23fxg , happy i help on this :) like say @ei23fxg need change the version for pytorch-lightning~=1.9.0

@ei23fxg thank you for the tip for speed up

i think be good idea somewhere put this in https://github.com/rhasspy/piper/blob/master/TRAINING.md in train because the rtx 4090 very powerful gpu card and it's sad you cant use it ,with this amazing repo

happy train to all ;)

Dec 06 '23 21:12 lpscr

If you don't mind editing the code and don't want to change versions for whatever reason, you can simply modify the device for that portion of the model to run on the CPU, then push the tensors back to the GPU. There's probably a bit of overhead, but likely still much faster.

In particular, in src/piper_train/vits/lightning.py:

        y_hat_mel = mel_spectrogram_torch(
            y_hat.squeeze(1).to("cpu"),
            self.hparams.filter_length,
            self.hparams.mel_channels,
            self.hparams.sample_rate,
            self.hparams.hop_length,
            self.hparams.win_length,
            self.hparams.mel_fmin,
            self.hparams.mel_fmax,
        )
        y_hat_mel = y_hat_mel.to("cuda")

Dec 08 '23 17:12 mitchelldehaven

1Is there a way to fix training not being possible on win11 rtx 4050 laptop GPU? I have been trying to train locally for a week now. and it never worked. I get an NRvtc error, or this:

Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/user/piper/src/python/piper_train/main.py", line 147, in main() File "/home/user/piper/src/python/piper_train/main.py", line 124, in main trainer.fit(model) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit self._call_and_handle_interrupt( File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run results = self._run_stage() File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage return self._run_train() File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1283, in _run_train self.fit_loop.run() File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run self.advance(*args, **kwargs) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 271, in advance self._outputs = self.epoch_loop.run(self._data_fetcher) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run self.advance(*args, **kwargs) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 203, in advance batch_output = self.batch_loop.run(kwargs) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run self.advance(*args, **kwargs) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 87, in advance outputs = self.optimizer_loop.run(optimizers, kwargs) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run self.advance(*args, **kwargs) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 201, in advance result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position]) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 248, in _run_optimization self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 358, in _optimizer_step self.trainer._call_lightning_module_hook( File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1550, in _call_lightning_module_hook output = fn(*args, **kwargs) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1705, in optimizer_step optimizer.step(closure=optimizer_closure) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 216, in optimizer_step return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 153, in optimizer_step return optimizer.step(closure=closure, **kwargs) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper return wrapped(*args, **kwargs) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 88, in wrapper return func(*args, **kwargs) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/adamw.py", line 100, in step loss = closure() File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 138, in _wrap_closure closure_result = closure() File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 146, in call self._result = self.closure(*args, **kwargs) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 132, in closure step_output = self._step_fn() File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 407, in _training_step training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values()) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1704, in _call_strategy_hook output = fn(*args, **kwargs) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 358, in training_step return self.model.training_step(*args, **kwargs) File "/home/user/piper/src/python/piper_train/vits/lightning.py", line 191, in training_step return self.training_step_g(batch) File "/home/user/piper/src/python/piper_train/vits/lightning.py", line 214, in training_step_g ) = self.model_g(x, x_lengths, spec, spec_lengths, speaker_ids) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/user/piper/src/python/piper_train/vits/models.py", line 625, in forward z, m_q, logs_q, y_mask = self.enc_q(y, y_lengths, g=g) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/user/piper/src/python/piper_train/vits/models.py", line 292, in forward x = self.enc(x, x_mask, g=g) File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/user/piper/src/python/piper_train/vits/modules.py", line 199, in forward acts = fused_add_tanh_sigmoid_multiply(x_in, g_l, n_channels_tensor) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed:

#define NAN __int_as_float(0x7fffffff) #define POS_INFINITY __int_as_float(0x7f800000) #define NEG_INFINITY __int_as_float(0xff800000)

template<typename T> device T maximum(T a, T b) { return isnan(a) ? a : (a > b ? a : b); }

template<typename T> device T minimum(T a, T b) { return isnan(a) ? a : (a < b ? a : b); }

extern "C" global void fused_tanh_sigmoid_mul(float* tv_, float* tv__, float* aten_mul, float* aten_sigmoid, float* aten_tanh) { { if ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)<2661120ll ? 1 : 0) { float tv___1 = ldg(tv + ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) % 221760ll + 2ll * ((((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 221760ll) * 221760ll)); aten_tanh[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = tanhf(tv___1); float tv__1 = _ldg(tv + ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) % 221760ll + 2ll * ((((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 221760ll) * 221760ll)); aten_sigmoid[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = 1.f / (1.f + (expf(0.f - tv__1))); aten_mul[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = (tanhf(tv___1)) * (1.f / (1.f + (expf(0.f - tv__1)))); }} }

Dec 09 '23 10:12 FemBoxbrawl

@FemBoxbrawl on laptop often dual gpu is active (Intel / Nvidia) Did select the nvidia gpu?

Dec 13 '23 18:12 ei23fxg

@ei23fxg I do, but i don't know how to do that (select gpu)

Dec 13 '23 19:12 FemBoxbrawl

@FemBoxbrawl on laptop often dual gpu is active (Intel / Nvidia) Did select the nvidia gpu?

I got it to work finally after help from @graylington the legend.

Dec 14 '23 08:12 FemBoxbrawl

@FemBoxbrawl on laptop often dual gpu is active (Intel / Nvidia) Did select the nvidia gpu?

I got it to work finally after help from @Graylington the legend.

could you please tell how to solve the problem.I have met the same issue with win11 rtx 4050 laptop GPU. thanks!!!!!!!!!

Jan 05 '24 09:01 lgy250

this is from Graylington, I cannot remember how to do it exactly in detail, but I troubleshooted and it worked( I do not remember what I did exactly, but just follow this:

(Graylington original message)

"here how i make to work with RTX 4090 and wls2 i use win 10

install developer python

sudo apt-get install python3-dev

Then create a Python virtual environment and activated:

cd piper/src/python python3 -m venv .venv source .venv/bin/activate update pip and wheel setuptools

pip3 install --upgrade pip pip3 install --upgrade wheel setuptools install pytorch this version pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

change in requirements.txt

cython>=0.29.0,<1 librosa>=0.9.2,<1 piper-phonemize~=1.1.0 numpy>=1.19.0 onnxruntime>=1.11.0 pytorch-lightning~=1.9.0 onnx run

pip3 install -e .

build build_monotonic_align

chmod +x build_monotonic_align.sh build_monotonic_align.sh i hope this help

This works great on my 4090! Problem is, I can no longer run inference."

Jan 05 '24 09:01 FemBoxbrawl

@FemBoxbrawl on laptop often dual gpu is active (Intel / Nvidia) Did select the nvidia gpu?

I got it to work finally after help from @Graylington the legend.

could you please tell how to solve the problem.I have met the same issue with win11 rtx 4050 laptop GPU. thanks!!!!!!!!!

Changing the text requirements was probably the most crucial step I think, but I don't remember

Jan 05 '24 09:01 FemBoxbrawl

@FemBoxbrawl on laptop often dual gpu is active (Intel / Nvidia) Did select the nvidia gpu?

I got it to work finally after help from @Graylington the legend.

could you please tell how to solve the problem.I have met the same issue with win11 rtx 4050 laptop GPU. thanks!!!!!!!!!

Changing the text requirements was probably the most crucial step I think, but I don't remember

thanks，I have finished the issue!!!!!!!!!!!! best wishes!!!!

Jan 08 '24 02:01 lgy250

How you did it? After changing pytorch-lightning to 1.8.4 and above, i'm receiving this errors: Traceback (most recent call last): File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/zywek/git/piper/src/python/piper_train/main.py", line 147, in main() File "/home/zywek/git/piper/src/python/piper_train/main.py", line 124, in main trainer.fit(model) File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 603, in fit call._call_and_handle_interrupt( File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 645, in _fit_impl self._run(model, ckpt_path=self.ckpt_path) File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1098, in _run results = self._run_stage() File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1177, in _run_stage self._run_train() File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1190, in _run_train self._run_sanity_check() File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1262, in _run_sanity_check val_loop.run() File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, **kwargs) File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 152, in advance dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs) File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, **kwargs) File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 137, in advance output = self._evaluation_step(**kwargs) File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 234, in _evaluation_step output = self.trainer._call_strategy_hook(hook_name, *kwargs.values()) File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1480, in _call_strategy_hook output = fn(*args, **kwargs) File "/home/zywek/git/piper/src/python/env/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 390, in validation_step return self.model.validation_step(*args, **kwargs) File "/home/zywek/git/piper/src/python/piper_train/vits/lightning.py", line 302, in validation_step self.logger.experiment.add_audio( TypeError: add_audio() missing 1 required positional argument: 'global_step'

Jan 19 '24 17:01 zywek123

Hi thank you very much for your great work !

here how i make to work with RTX 4090 and wls2 i use win 10

install developer python

sudo apt-get install python3-dev

Then create a Python virtual environment and activated:
cd piper/src/python
python3 -m venv .venv
source .venv/bin/activate
update pip and wheel setuptools
pip3 install --upgrade pip
pip3 install --upgrade wheel setuptools
install pytorch this version pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

change in requirements.txt
cython>=0.29.0,<1
librosa>=0.9.2,<1
piper-phonemize~=1.1.0
numpy>=1.19.0
onnxruntime>=1.11.0
pytorch-lightning~=1.9.0
onnx
run

pip3 install -e .

build build_monotonic_align
chmod +x build_monotonic_align.sh
build_monotonic_align.sh
i hope this help

Thanks to you Guys, I think I may be almost there.

My new issue:

lightning_fabric/utilities/types.py", line 36, in
UntypedStorage: TypeAlias = torch.UntypedStorage AttributeError: module 'torch' has no attribute 'UntypedStorage'

Any help?

[UPDATE]

The issue was related to the torch version I installed. I reinstalled everything and it looks like its working.

Thank you Guys!

May 16 '24 19:05 eusthace811

Thank you lpscr! I've been trying to solve this dreaded "RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR" for 3 days. I was about to give up when I came across a comment on a YouTube video that there was a fix mentioned on the issues board. For reference, my GPU is listed as: NVIDIA RTX 4000 Ada Generation Laptop GPU

Jun 30 '24 05:06 nikywilliams

I ran into this with my RTX 4060 Ti as well (on Ubuntu 24.04). I struggled getting @lpscr's solution to work, my mistake was that I wasn't following them properly. I only updated the pytorch-lightning version in requirements.txt (as it is different). What I should have done was replace the entire file with the contents exactly as pasted. Finally I am no longer getting the "RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR" error anymore! Thank you so much @lpscr!

Dec 30 '24 02:12 tomuta

Thank you lpscr and Thorsten his video guide!

I managed to get it working on a debain bookworm with python3.12. I encountered some issues with pytorch and cuda conflict, but managed to solve this with this install: https://github.com/googlecolab/colabtools/issues/4344. now my 4090 on my laptop is spinning like a charm.

Feb 18 '25 22:02 marienkamphof

piper
piper copied to clipboard

No training possible on RTX 4090: CUFFT_INTERNAL_ERROR with torch < 2 (WSL2 & native Ubuntu Linux)

piper piper copied to clipboard

No training possible on RTX 4090: CUFFT_INTERNAL_ERROR with torch < 2 (WSL2 & native Ubuntu Linux)

piper
piper copied to clipboard