fish-diffusion
fish-diffusion copied to clipboard
CUDA bfloat16 problem
2023-10-23 10:49:30,409 WARNING: logs/HiFiSVC doesn't exist yet!
Global seed set to 594461
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Missing logger folder: logs/HiFiSVC
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
-----------------------------------------------------------
0 | generator | HiFiSinger | 14.9 M
1 | mpd | MultiPeriodDiscriminator | 57.5 M
2 | msd | MultiScaleDiscriminator | 29.6 M
3 | mel_transform | MelSpectrogram | 0
-----------------------------------------------------------
102 M Trainable params
0 Non-trainable params
102 M Total params
408.124 Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:442: PossibleUserWarning: The dataloader, val_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
Sanity Checking DataLoader 0: 0% 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "/content/fish-diffusion/tools/hifisinger/train.py", line 83, in <module>
trainer.fit(model, train_loader, valid_loader, ckpt_path=args.resume)
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 532, in fit
call._call_and_handle_interrupt(
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 571, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 980, in _run
results = self._run_stage()
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1021, in _run_stage
self._run_sanity_check()
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1050, in _run_sanity_check
val_loop.run()
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 181, in _decorator
return loop_run(self, *args, **kwargs)
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 115, in run
self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 376, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_kwargs.values())
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 294, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 391, in validation_step
with self.precision_plugin.val_step_context():
File "/content/env/envs/fish_diffusion/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 170, in val_step_context
with self.forward_context():
File "/content/env/envs/fish_diffusion/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/amp.py", line 118, in forward_context
with self.autocast_context_manager():
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/amp.py", line 113, in autocast_context_manager
return torch.autocast(self.device, dtype=torch.bfloat16 if self.precision == "bf16-mixed" else torch.half)
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 234, in __init__
raise RuntimeError('Current CUDA Device does not support bfloat16. Please switch dtype to float16.')
RuntimeError: Current CUDA Device does not support bfloat16. Please switch dtype to float16.
Google Colab using T4 gpu and any other gpu - still the same error when I tried to train my model
hi friend, I occur the same error, have you solved it, thank you sincerely if you can help
hi friend, I occur the same error, have you solved it, thank you sincerely if you can help
It's not fixed yet but I found workaround. After you make environment and clone git of Fish-Diff into your colab you have to change config file in fish because google colab doesn't use bfloat16 so it has to be changed (for now as a quick solution). Here is the file directory: /content/fish-diffusion/configs/base/trainers/base.py
in this file change line 18 from 'precision="bf16-mixed",' to 'precision="16-mixed",' and save it. It should work now.
thank you very much for your help, it works now. Look forward to discussing with you about the effects of the model Sincerely yours!
dear friend, this code can fine-tune text encoder projection layer + diffusion or fine-tune hifigan, but have you fine-tuned contentvec using this code? such as use different layers' transformer in contentvec to cut size
thank you sincerely for your help