so-vits-svc-fork
so-vits-svc-fork copied to clipboard
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Traceback (most recent call last):
File "/usr/local/bin/svc", line 8, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/__main__.py", line 129, in train
train(
File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/train.py", line 120, in train
tuner.scale_batch_size(
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/tuner/tuning.py", line 93, in scale_batch_size
self._trainer.fit(model, train_dataloaders, val_dataloaders, datamodule)
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/trainer/trainer.py", line 520, in fit
call._call_and_handle_interrupt(
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/trainer/trainer.py", line 559, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/trainer/trainer.py", line 915, in _run
call._call_callback_hooks(self, "on_fit_start")
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/trainer/call.py", line 190, in _call_callback_hooks
fn(trainer, trainer.lightning_module, *args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/callbacks/batch_size_finder.py", line 185, in on_fit_start
self.scale_batch_size(trainer, pl_module)
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/callbacks/batch_size_finder.py", line 171, in scale_batch_size
new_size = _scale_batch_size(
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/tuner/batch_size_scaling.py", line 83, in _scale_batch_size
new_size = _run_binary_scaling(trainer, new_size, batch_arg_name, max_trials, params)
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/tuner/batch_size_scaling.py", line 231, in _run_binary_scaling
_try_loop_run(trainer, params)
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/tuner/batch_size_scaling.py", line 331, in _try_loop_run
loop.run()
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/loops/fit_loop.py", line 201, in run
self.advance()
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/loops/fit_loop.py", line 354, in advance
self.epoch_loop.run(self._data_fetcher)
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/loops/training_epoch_loop.py", line 133, in run
self.advance(data_fetcher)
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/loops/training_epoch_loop.py", line 220, in advance
batch_output = self.manual_optimization.run(kwargs)
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/loops/optimization/manual.py", line 90, in run
self.advance(kwargs)
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/loops/optimization/manual.py", line 109, in advance
training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/trainer/call.py", line 288, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/strategies/strategy.py", line 366, in training_step
return self.model.training_step(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/train.py", line 522, in training_step
self.manual_backward(loss_disc_all / accumulate_grad_batches)
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/core/module.py", line 1036, in manual_backward
self.trainer.strategy.backward(loss, None, *args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/strategies/strategy.py", line 199, in backward
self.precision_plugin.backward(closure_loss, self.lightning_module, optimizer, *args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/plugins/precision/precision_plugin.py", line 67, in backward
model.backward(tensor, *args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/core/module.py", line 1054, in backward
loss.backward(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.9/dist-packages/torch/autograd/__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I'm getting this error after running train on paperspace.
Same error on Anaconda3. (Ubuntu 23.04, NVIDIA GeForce GTX 1660 SUPER, Python 3.9, CUDA 11.6, PyTorch 20.0.0+cu118, conda 22.9.0)
Show all output
[18:08:56] INFO [18:08:56] NumExpr defaulting to 6 threads. utils.py:159
[18:08:58] INFO [18:08:58] Created a temporary directory at /tmp/tmpl6zxp_iw instantiator.py:21
INFO [18:08:58] Writing /tmp/tmpl6zxp_iw/_remote_module_non_scriptable.py instantiator.py:76
INFO [18:08:58] Server binary (from Python package v0.7.0): /home/okaits/anaconda3/lib/python3.9/site-packages/tensorboard_data_server/bin/server server_ingester.py:290
NOTE: Using experimental fast data loading logic. To disable, pass
"--load_fast=false" and report issues on GitHub. More details:
https://github.com/tensorflow/tensorboard/issues/4784
[18:09:00] INFO [18:09:00] 127.0.0.1 - - [29/Apr/2023 18:09:00] "GET / HTTP/1.1" 200 - _internal.py:224
INFO [18:09:00] 127.0.0.1 - - [29/Apr/2023 18:09:00] "GET /font-roboto/oMMgfZMQthOryQo9n22dcuvvDin1pK8aKteLpeZ5c0A.woff2 HTTP/1.1" 200 - _internal.py:224
[18:09:01] INFO [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /icon_bundle.svg HTTP/1.1" 200 - _internal.py:224
INFO [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /font-roboto/RxZJdnzeo3R5zSexge8UUZBw1xU1rKptJj_0jans920.woff2 HTTP/1.1" 200 - _internal.py:224
INFO [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /data/plugins_listing HTTP/1.1" 200 - _internal.py:224
INFO [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /data/environment HTTP/1.1" 200 - _internal.py:224
INFO [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /data/runs HTTP/1.1" 200 - _internal.py:224
INFO [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /data/environment HTTP/1.1" 200 - _internal.py:224
INFO [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /data/runs HTTP/1.1" 200 - _internal.py:224
INFO [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /font-roboto/d-6IYplOFocCacKzxwXSOJBw1xU1rKptJj_0jans920.woff2 HTTP/1.1" 200 - _internal.py:224
INFO [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /font-roboto/vPcynSL0qHq_6dX7lKVByXYhjbSpvc47ee6xR_80Hnw.woff2 HTTP/1.1" 200 - _internal.py:224
[18:09:05] INFO [18:09:05] Using strategy: auto train.py:88
INFO: GPU available: True (cuda), used: True
[18:09:06] INFO [18:09:06] GPU available: True (cuda), used: True setup.py:163
INFO: TPU available: False, using: 0 TPU cores
INFO [18:09:06] TPU available: False, using: 0 TPU cores setup.py:166
INFO: IPU available: False, using: 0 IPUs
INFO [18:09:06] IPU available: False, using: 0 IPUs setup.py:169
INFO: HPU available: False, using: 0 HPUs
INFO [18:09:06] HPU available: False, using: 0 HPUs setup.py:172
WARNING [18:09:06] /home/okaits/anaconda3/lib/python3.9/site-packages/so_vits_svc_fork/modules/synthesizers.py:81: UserWarning: Unused arguments: {'n_layers_q': 3, 'use_spectral_norm': False} warnings.py:109
warnings.warn(f"Unused arguments: {kwargs}")
INFO [18:09:06] Decoder type: hifi-gan synthesizers.py:100
[18:09:17] WARNING [18:09:17] /home/okaits/anaconda3/lib/python3.9/site-packages/so_vits_svc_fork/utils.py:198: UserWarning: Keys not found in checkpoint state dict:['emb_g.weight'] warnings.py:109
warnings.warn(f"Keys not found in checkpoint state dict:" f"{not_in_from}")
[18:09:23] INFO [18:09:23] Loaded checkpoint 'logs/44k/G_0.pth' (iteration 0) utils.py:259
[18:09:24] INFO [18:09:24] Loaded checkpoint 'logs/44k/D_0.pth' (iteration 0) utils.py:259
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[18:09:30] INFO [18:09:30] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] cuda.py:57
WARNING [18:09:30] /home/okaits/anaconda3/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only warnings.py:109
matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
WARNING [18:09:30] /home/okaits/anaconda3/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only warnings.py:109
matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
WARNING [18:09:30] /home/okaits/anaconda3/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only warnings.py:109
matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
[18:09:31] WARNING [18:09:31] /home/okaits/anaconda3/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only warnings.py:109
matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
[18:09:31] WARNING [18:09:31] /home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:430: PossibleUserWarning: The dataloader, val_dataloader, does not have many workers which may be a warnings.py:109
bottleneck. Consider increasing the value of the `num_workers` argument` (try 6 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
WARNING [18:09:31] /home/okaits/anaconda3/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only warnings.py:109
matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
[18:09:53] WARNING [18:09:53] /home/okaits/anaconda3/lib/python3.9/site-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, warnings.py:109
and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
[18:09:54] WARNING [18:09:54] /home/okaits/anaconda3/lib/python3.9/site-packages/so_vits_svc_fork/train.py:387: UserWarning: Logging is only supported with TensorBoardLogger. warnings.py:109
warnings.warn("Logging is only supported with TensorBoardLogger.")
INFO: `Trainer.fit` stopped: `max_steps=1` reached.
[18:09:57] INFO [18:09:57] `Trainer.fit` stopped: `max_steps=1` reached. fit_loop.py:166
INFO: Batch size 2 succeeded, trying batch size 4
INFO [18:09:57] Batch size 2 succeeded, trying batch size 4 batch_size_scaling.py:313
INFO: `Trainer.fit` stopped: `max_steps=1` reached.
[18:10:06] INFO [18:10:06] `Trainer.fit` stopped: `max_steps=1` reached. fit_loop.py:166
INFO: Batch size 4 succeeded, trying batch size 8
INFO [18:10:06] Batch size 4 succeeded, trying batch size 8 batch_size_scaling.py:313
INFO: Batch size 8 failed, trying batch size 6
[18:10:11] INFO [18:10:11] Batch size 8 failed, trying batch size 6 batch_size_scaling.py:313
Traceback (most recent call last):
File "/home/okaits/anaconda3/bin/svc", line 8, in <module>
sys.exit(cli())
File "/home/okaits/anaconda3/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/okaits/anaconda3/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/so_vits_svc_fork/__main__.py", line 129, in train
train(
File "/home/okaits/anaconda3/lib/python3.9/site-packages/so_vits_svc_fork/train.py", line 120, in train
tuner.scale_batch_size(
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/tuner/tuning.py", line 93, in scale_batch_size
self._trainer.fit(model, train_dataloaders, val_dataloaders, datamodule)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 520, in fit
call._call_and_handle_interrupt(
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 559, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 915, in _run
call._call_callback_hooks(self, "on_fit_start")
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/trainer/call.py", line 190, in _call_callback_hooks
fn(trainer, trainer.lightning_module, *args, **kwargs)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/callbacks/batch_size_finder.py", line 185, in on_fit_start
self.scale_batch_size(trainer, pl_module)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/callbacks/batch_size_finder.py", line 171, in scale_batch_size
new_size = _scale_batch_size(
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/tuner/batch_size_scaling.py", line 83, in _scale_batch_size
new_size = _run_binary_scaling(trainer, new_size, batch_arg_name, max_trials, params)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/tuner/batch_size_scaling.py", line 231, in _run_binary_scaling
_try_loop_run(trainer, params)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/tuner/batch_size_scaling.py", line 331, in _try_loop_run
loop.run()
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/loops/fit_loop.py", line 201, in run
self.advance()
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/loops/fit_loop.py", line 354, in advance
self.epoch_loop.run(self._data_fetcher)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 133, in run
self.advance(data_fetcher)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 220, in advance
batch_output = self.manual_optimization.run(kwargs)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/loops/optimization/manual.py", line 90, in run
self.advance(kwargs)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/loops/optimization/manual.py", line 109, in advance
training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/trainer/call.py", line 288, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/strategies/strategy.py", line 366, in training_step
return self.model.training_step(*args, **kwargs)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/so_vits_svc_fork/train.py", line 522, in training_step
self.manual_backward(loss_disc_all / accumulate_grad_batches)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/core/module.py", line 1036, in manual_backward
self.trainer.strategy.backward(loss, None, *args, **kwargs)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/strategies/strategy.py", line 199, in backward
self.precision_plugin.backward(closure_loss, self.lightning_module, optimizer, *args, **kwargs)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/plugins/precision/precision_plugin.py", line 67, in backward
model.backward(tensor, *args, **kwargs)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/core/module.py", line 1054, in backward
loss.backward(*args, **kwargs)
File "/home/okaits/anaconda3/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/okaits/anaconda3/lib/python3.9/site-packages/torch/autograd/__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I don't know how, somehow the train started without any issue
I tried it with my Windows 11 (CUDA 12.1) and failed with the same error. So I think this error can be caused by something with my hardware or CUDA or my hands. I also tried with Docker, and also failed.
(I used DeepL Write for proofreading my text, but my English may be incorrect...)
Train started without any errors by disabling GPU in docker. But now it's too slow...
i got this error on my pc when batch_size was auto. i set it to 4 and it's gone
Oh, it worked!
i got this error on my pc when batch_size was auto. i set it to 4 and it's gone
yes, for me it's 8. confirmed causing by "auto"