so-vits-svc-fork icon indicating copy to clipboard operation
so-vits-svc-fork copied to clipboard

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Open pierluigizagaria opened this issue 1 year ago • 7 comments

Traceback (most recent call last):
  File "/usr/local/bin/svc", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/__main__.py", line 129, in train
    train(
  File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/train.py", line 120, in train
    tuner.scale_batch_size(
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/tuner/tuning.py", line 93, in scale_batch_size
    self._trainer.fit(model, train_dataloaders, val_dataloaders, datamodule)
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/trainer/trainer.py", line 520, in fit
    call._call_and_handle_interrupt(
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/trainer/trainer.py", line 559, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/trainer/trainer.py", line 915, in _run
    call._call_callback_hooks(self, "on_fit_start")
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/trainer/call.py", line 190, in _call_callback_hooks
    fn(trainer, trainer.lightning_module, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/callbacks/batch_size_finder.py", line 185, in on_fit_start
    self.scale_batch_size(trainer, pl_module)
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/callbacks/batch_size_finder.py", line 171, in scale_batch_size
    new_size = _scale_batch_size(
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/tuner/batch_size_scaling.py", line 83, in _scale_batch_size
    new_size = _run_binary_scaling(trainer, new_size, batch_arg_name, max_trials, params)
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/tuner/batch_size_scaling.py", line 231, in _run_binary_scaling
    _try_loop_run(trainer, params)
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/tuner/batch_size_scaling.py", line 331, in _try_loop_run
    loop.run()
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/loops/fit_loop.py", line 201, in run
    self.advance()
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/loops/fit_loop.py", line 354, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/loops/training_epoch_loop.py", line 133, in run
    self.advance(data_fetcher)
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/loops/training_epoch_loop.py", line 220, in advance
    batch_output = self.manual_optimization.run(kwargs)
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/loops/optimization/manual.py", line 90, in run
    self.advance(kwargs)
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/loops/optimization/manual.py", line 109, in advance
    training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/trainer/call.py", line 288, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/strategies/strategy.py", line 366, in training_step
    return self.model.training_step(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/train.py", line 522, in training_step
    self.manual_backward(loss_disc_all / accumulate_grad_batches)
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/core/module.py", line 1036, in manual_backward
    self.trainer.strategy.backward(loss, None, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/strategies/strategy.py", line 199, in backward
    self.precision_plugin.backward(closure_loss, self.lightning_module, optimizer, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/plugins/precision/precision_plugin.py", line 67, in backward
    model.backward(tensor, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/lightning/pytorch/core/module.py", line 1054, in backward
    loss.backward(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/usr/local/lib/python3.9/dist-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I'm getting this error after running train on paperspace.

pierluigizagaria avatar Apr 28 '23 22:04 pierluigizagaria

Same error on Anaconda3. (Ubuntu 23.04, NVIDIA GeForce GTX 1660 SUPER, Python 3.9, CUDA 11.6, PyTorch 20.0.0+cu118, conda 22.9.0)

Show all output
[18:08:56] INFO     [18:08:56] NumExpr defaulting to 6 threads.                                                                                                                                                                                           utils.py:159
[18:08:58] INFO     [18:08:58] Created a temporary directory at /tmp/tmpl6zxp_iw                                                                                                                                                                    instantiator.py:21
           INFO     [18:08:58] Writing /tmp/tmpl6zxp_iw/_remote_module_non_scriptable.py                                                                                                                                                            instantiator.py:76
           INFO     [18:08:58] Server binary (from Python package v0.7.0): /home/okaits/anaconda3/lib/python3.9/site-packages/tensorboard_data_server/bin/server                                                                                server_ingester.py:290

NOTE: Using experimental fast data loading logic. To disable, pass
    "--load_fast=false" and report issues on GitHub. More details:
    https://github.com/tensorflow/tensorboard/issues/4784

[18:09:00] INFO     [18:09:00] 127.0.0.1 - - [29/Apr/2023 18:09:00] "GET / HTTP/1.1" 200 -                                                                                                                                                            _internal.py:224
           INFO     [18:09:00] 127.0.0.1 - - [29/Apr/2023 18:09:00] "GET /font-roboto/oMMgfZMQthOryQo9n22dcuvvDin1pK8aKteLpeZ5c0A.woff2 HTTP/1.1" 200 -                                                                                               _internal.py:224
[18:09:01] INFO     [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /icon_bundle.svg HTTP/1.1" 200 -                                                                                                                                             _internal.py:224
           INFO     [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /font-roboto/RxZJdnzeo3R5zSexge8UUZBw1xU1rKptJj_0jans920.woff2 HTTP/1.1" 200 -                                                                                               _internal.py:224
           INFO     [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /data/plugins_listing HTTP/1.1" 200 -                                                                                                                                        _internal.py:224
           INFO     [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /data/environment HTTP/1.1" 200 -                                                                                                                                            _internal.py:224
           INFO     [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /data/runs HTTP/1.1" 200 -                                                                                                                                                   _internal.py:224
           INFO     [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /data/environment HTTP/1.1" 200 -                                                                                                                                            _internal.py:224
           INFO     [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /data/runs HTTP/1.1" 200 -                                                                                                                                                   _internal.py:224
           INFO     [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /font-roboto/d-6IYplOFocCacKzxwXSOJBw1xU1rKptJj_0jans920.woff2 HTTP/1.1" 200 -                                                                                               _internal.py:224
           INFO     [18:09:01] 127.0.0.1 - - [29/Apr/2023 18:09:01] "GET /font-roboto/vPcynSL0qHq_6dX7lKVByXYhjbSpvc47ee6xR_80Hnw.woff2 HTTP/1.1" 200 -                                                                                               _internal.py:224
[18:09:05] INFO     [18:09:05] Using strategy: auto                                                                                                                                                                                                        train.py:88
INFO: GPU available: True (cuda), used: True
[18:09:06] INFO     [18:09:06] GPU available: True (cuda), used: True                                                                                                                                                                                     setup.py:163
INFO: TPU available: False, using: 0 TPU cores
           INFO     [18:09:06] TPU available: False, using: 0 TPU cores                                                                                                                                                                                   setup.py:166
INFO: IPU available: False, using: 0 IPUs
           INFO     [18:09:06] IPU available: False, using: 0 IPUs                                                                                                                                                                                        setup.py:169
INFO: HPU available: False, using: 0 HPUs
           INFO     [18:09:06] HPU available: False, using: 0 HPUs                                                                                                                                                                                        setup.py:172
           WARNING  [18:09:06] /home/okaits/anaconda3/lib/python3.9/site-packages/so_vits_svc_fork/modules/synthesizers.py:81: UserWarning: Unused arguments: {'n_layers_q': 3, 'use_spectral_norm': False}                                            warnings.py:109
                      warnings.warn(f"Unused arguments: {kwargs}")                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                      
           INFO     [18:09:06] Decoder type: hifi-gan                                                                                                                                                                                              synthesizers.py:100
[18:09:17] WARNING  [18:09:17] /home/okaits/anaconda3/lib/python3.9/site-packages/so_vits_svc_fork/utils.py:198: UserWarning: Keys not found in checkpoint state dict:['emb_g.weight']                                                                 warnings.py:109
                      warnings.warn(f"Keys not found in checkpoint state dict:" f"{not_in_from}")                                                                                                                                                                     
                                                                                                                                                                                                                                                                      
[18:09:23] INFO     [18:09:23] Loaded checkpoint 'logs/44k/G_0.pth' (iteration 0)                                                                                                                                                                         utils.py:259
[18:09:24] INFO     [18:09:24] Loaded checkpoint 'logs/44k/D_0.pth' (iteration 0)                                                                                                                                                                         utils.py:259
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[18:09:30] INFO     [18:09:30] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]                                                                                                                                                                                    cuda.py:57
           WARNING  [18:09:30] /home/okaits/anaconda3/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only   warnings.py:109
                    matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()                                                                                                    
                      return self.fget.__get__(instance, owner)()                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                      
           WARNING  [18:09:30] /home/okaits/anaconda3/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only   warnings.py:109
                    matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()                                                                                                    
                      return self.fget.__get__(instance, owner)()                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                      
           WARNING  [18:09:30] /home/okaits/anaconda3/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only   warnings.py:109
                    matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()                                                                                                    
                      return self.fget.__get__(instance, owner)()                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                      
[18:09:31] WARNING  [18:09:31] /home/okaits/anaconda3/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only   warnings.py:109
                    matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()                                                                                                    
                      return self.fget.__get__(instance, owner)()                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                      
[18:09:31] WARNING  [18:09:31] /home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:430: PossibleUserWarning: The dataloader, val_dataloader, does not have many workers which may be a           warnings.py:109
                    bottleneck. Consider increasing the value of the `num_workers` argument` (try 6 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.                                                                     
                      rank_zero_warn(                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                      
           WARNING  [18:09:31] /home/okaits/anaconda3/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only   warnings.py:109
                    matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()                                                                                                    
                      return self.fget.__get__(instance, owner)()                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                      
[18:09:53] WARNING  [18:09:53] /home/okaits/anaconda3/lib/python3.9/site-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs,    warnings.py:109
                    and return_complex=False will raise an error.                                                                                                                                                                                                     
                    Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.)                                                                        
                      return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]                                                                                                                                                     
                                                                                                                                                                                                                                                                      
[18:09:54] WARNING  [18:09:54] /home/okaits/anaconda3/lib/python3.9/site-packages/so_vits_svc_fork/train.py:387: UserWarning: Logging is only supported with TensorBoardLogger.                                                                        warnings.py:109
                      warnings.warn("Logging is only supported with TensorBoardLogger.")                                                                                                                                                                              
                                                                                                                                                                                                                                                                      
INFO: `Trainer.fit` stopped: `max_steps=1` reached.
[18:09:57] INFO     [18:09:57] `Trainer.fit` stopped: `max_steps=1` reached.                                                                                                                                                                           fit_loop.py:166
INFO: Batch size 2 succeeded, trying batch size 4
           INFO     [18:09:57] Batch size 2 succeeded, trying batch size 4                                                                                                                                                                   batch_size_scaling.py:313
INFO: `Trainer.fit` stopped: `max_steps=1` reached.
[18:10:06] INFO     [18:10:06] `Trainer.fit` stopped: `max_steps=1` reached.                                                                                                                                                                           fit_loop.py:166
INFO: Batch size 4 succeeded, trying batch size 8
           INFO     [18:10:06] Batch size 4 succeeded, trying batch size 8                                                                                                                                                                   batch_size_scaling.py:313
INFO: Batch size 8 failed, trying batch size 6
[18:10:11] INFO     [18:10:11] Batch size 8 failed, trying batch size 6                                                                                                                                                                      batch_size_scaling.py:313
Traceback (most recent call last):
  File "/home/okaits/anaconda3/bin/svc", line 8, in <module>
    sys.exit(cli())
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/so_vits_svc_fork/__main__.py", line 129, in train
    train(
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/so_vits_svc_fork/train.py", line 120, in train
    tuner.scale_batch_size(
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/tuner/tuning.py", line 93, in scale_batch_size
    self._trainer.fit(model, train_dataloaders, val_dataloaders, datamodule)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 520, in fit
    call._call_and_handle_interrupt(
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 559, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 915, in _run
    call._call_callback_hooks(self, "on_fit_start")
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/trainer/call.py", line 190, in _call_callback_hooks
    fn(trainer, trainer.lightning_module, *args, **kwargs)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/callbacks/batch_size_finder.py", line 185, in on_fit_start
    self.scale_batch_size(trainer, pl_module)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/callbacks/batch_size_finder.py", line 171, in scale_batch_size
    new_size = _scale_batch_size(
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/tuner/batch_size_scaling.py", line 83, in _scale_batch_size
    new_size = _run_binary_scaling(trainer, new_size, batch_arg_name, max_trials, params)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/tuner/batch_size_scaling.py", line 231, in _run_binary_scaling
    _try_loop_run(trainer, params)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/tuner/batch_size_scaling.py", line 331, in _try_loop_run
    loop.run()
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/loops/fit_loop.py", line 201, in run
    self.advance()
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/loops/fit_loop.py", line 354, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 133, in run
    self.advance(data_fetcher)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 220, in advance
    batch_output = self.manual_optimization.run(kwargs)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/loops/optimization/manual.py", line 90, in run
    self.advance(kwargs)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/loops/optimization/manual.py", line 109, in advance
    training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/trainer/call.py", line 288, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/strategies/strategy.py", line 366, in training_step
    return self.model.training_step(*args, **kwargs)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/so_vits_svc_fork/train.py", line 522, in training_step
    self.manual_backward(loss_disc_all / accumulate_grad_batches)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/core/module.py", line 1036, in manual_backward
    self.trainer.strategy.backward(loss, None, *args, **kwargs)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/strategies/strategy.py", line 199, in backward
    self.precision_plugin.backward(closure_loss, self.lightning_module, optimizer, *args, **kwargs)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/plugins/precision/precision_plugin.py", line 67, in backward
    model.backward(tensor, *args, **kwargs)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/lightning/pytorch/core/module.py", line 1054, in backward
    loss.backward(*args, **kwargs)
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/home/okaits/anaconda3/lib/python3.9/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

okaits avatar Apr 29 '23 09:04 okaits

I don't know how, somehow the train started without any issue

pierluigizagaria avatar Apr 29 '23 10:04 pierluigizagaria

I tried it with my Windows 11 (CUDA 12.1) and failed with the same error. So I think this error can be caused by something with my hardware or CUDA or my hands. I also tried with Docker, and also failed.

(I used DeepL Write for proofreading my text, but my English may be incorrect...)

okaits avatar Apr 30 '23 08:04 okaits

Train started without any errors by disabling GPU in docker. But now it's too slow...

okaits avatar Apr 30 '23 12:04 okaits

i got this error on my pc when batch_size was auto. i set it to 4 and it's gone

megapro17 avatar May 08 '23 18:05 megapro17

Oh, it worked!

okaits avatar May 11 '23 14:05 okaits

i got this error on my pc when batch_size was auto. i set it to 4 and it's gone

yes, for me it's 8. confirmed causing by "auto"

lightsing avatar Jun 03 '23 11:06 lightsing