AMD GPU support
I am trying to train a voice model with my 6700XT but when I run the training command I get this error that says that pytorch_lightning.utilities.exceptions.MisconfigurationException: No supported gpu backend found!
full event:
(.venv) ishaan@ishaanpc:~/Music/training/piper/src/python$ python3 -m piper_train --dataset-dir ~/Music/training/train-me --accelerator 'gpu' --devices 1 --batch-size 32 --validation-split 0.0 --num-test-examples 0 --max_epochs 4000 --resume_from_checkpoint "./epoch=663-step=646736.ckpt" --checkpoint-epochs 1 --precision 16 --max-phoneme-ids 400 --quality medium DEBUG:piper_train:Namespace(dataset_dir='/home/ishaan/Music/training/train-me', checkpoint_epochs=1, quality='medium', resume_from_single_speaker_checkpoint=None, logger=True, enable_checkpointing=True, default_root_dir=None, gradient_clip_val=None, gradient_clip_algorithm=None, num_nodes=1, num_processes=None, devices='1', gpus=None, auto_select_gpus=False, tpu_cores=None, ipus=None, enable_progress_bar=True, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=None, max_epochs=4000, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, limit_train_batches=None, limit_val_batches=None, limit_test_batches=None, limit_predict_batches=None, val_check_interval=None, log_every_n_steps=50, accelerator='gpu', strategy=None, sync_batchnorm=False, precision=16, enable_model_summary=True, weights_save_path=None, num_sanity_val_steps=2, resume_from_checkpoint='./epoch=663-step=646736.ckpt', profiler=None, benchmark=None, deterministic=None, reload_dataloaders_every_n_epochs=0, auto_lr_find=False, replace_sampler_ddp=True, detect_anomaly=False, auto_scale_batch_size=False, plugins=None, amp_backend='native', amp_level=None, move_metrics_to_cpu=False, multiple_trainloader_mode='max_size_cycle', batch_size=32, validation_split=0.0, num_test_examples=0, max_phoneme_ids=400, hidden_channels=192, inter_channels=192, filter_channels=768, n_layers=6, n_heads=2, seed=1234) Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/ishaan/Music/training/piper/src/python/piper_train/__main__.py", line 147, in <module> main() File "/home/ishaan/Music/training/piper/src/python/piper_train/__main__.py", line 60, in main trainer = Trainer.from_argparse_args(args) File "/home/ishaan/Music/training/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 2449, in from_argparse_args return from_argparse_args(cls, args, **kwargs) File "/home/ishaan/Music/training/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/argparse.py", line 72, in from_argparse_args return cls(**trainer_kwargs) File "/home/ishaan/Music/training/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/argparse.py", line 345, in insert_env_defaults return fn(self, **kwargs) File "/home/ishaan/Music/training/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 433, in __init__ self._accelerator_connector = AcceleratorConnector( File "/home/ishaan/Music/training/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 212, in __init__ self._accelerator_flag = self._choose_gpu_accelerator_backend() File "/home/ishaan/Music/training/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 518, in _choose_gpu_accelerator_backend raise MisconfigurationException("No supported gpu backend found!") pytorch_lightning.utilities.exceptions.MisconfigurationException: No supported gpu backend found!
It is actually possible to run Training on AMD GPUs.
The only thing we have to take care is installation of Pytorch compatible with ROCm.
ie, you have to follow instruction from this page https://pytorch.org/get-started/locally/ to install ROCm version of pytorch.
First install required torch , then run pip install -r requirement*.txt . That is the only thing we have to do.
In my case, I forked this repo and merged some PRs into my work to experiment with. One such PR actually used Torch 2x version, so I always use that branch while doing experiments.
you can see my currently working branch here https://github.com/harish2704/piper/tree/rmcpantoja-master
My machine is Fedora 42 with "AMD Radeon RX Vega 56" GPU. My GPU has 8GB RAM so I found that batch-size can not exceed 16. 16 was the optimal size. If I put 18, performance used to degrade even though it will not throw out of memory error
Please let me know if you need any more help
Hi,
Thanks for the response I have a similar setup with a 5700x a 6700 10gb and 32gb ram on fedora 42 as well. I will try this later today and get back to you on if it works the instructions i was using were these: https://blog.networkchuck.com/posts/how-to-clone-a-voice/ This one was for a 4090 lol do you have a set of instructions that worked best for you?