Voice-Cloning-App icon indicating copy to clipboard operation
Voice-Cloning-App copied to clipboard

TypeError: forward() missing 1 required positional argument: 'inputs' when training

Open mmmmllll1 opened this issue 1 year ago • 0 comments

I'm trying to run training and getting the following error;

Exception in thread Thread-19:                                                                       
Traceback (most recent call last):                                                                                                                                                                         
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner 
    self.run()                                  
  File "/usr/lib/python3.7/threading.py", line 870, in run           
    self._target(*self._args, **self._kwargs)                                                                                                                                                              
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/application/utils.py", line 66, in background_task
    raise e                                                                                                                                                                                                
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/application/utils.py", line 62, in background_task
    func(logging=logger, **kwargs)              
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/training/train.py", line 219, in train
    y, y_pred = process_batch(batch, model)                                                                                                                                                                
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/training/tacotron2_model/utils.py", line 88, in process_batch
    y_pred = model(batch, mask_size=output_length_size, alignment_mask_size=input_length_size)
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl          
    return forward_call(*input, **kwargs)                                                            
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/venv/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/venv/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/venv/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()                                                                                 
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/venv/lib/python3.7/site-packages/torch/_utils.py", line 425, in reraise
    raise self.exc_type(msg)                                                                                                                                                                               
TypeError: Caught TypeError in replica 6 on device 6.
Original Traceback (most recent call last):                                                          
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/venv/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)                                                                
  File "/ssd/github/BenAAndrew/Voice-Cloning-App/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)                                                            
TypeError: forward() missing 1 required positional argument: 'inputs'   

A quick googling makes it seems like this might be related to https://github.com/pytorch/pytorch/issues/31460 ?

I had to reduce the batch size to 128 otherwise the first GPU runs out of memory with the dreaded "CUDA out of memory" error.

This machine has 8 x V100 16GB Nvidia GPUs in it (I've excluded the T1000 using CUDA_VISIBLE_DEVICES). See below;

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-SXM2-16GB            On | 00000000:0D:00.0 Off |                    0 |
| N/A   40C    P0               56W / 300W|  16147MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2-16GB            On | 00000000:0E:00.0 Off |                    0 |
| N/A   38C    P0               57W / 300W|  12529MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2-16GB            On | 00000000:14:00.0 Off |                    0 |
| N/A   36C    P0               57W / 300W|  11233MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2-16GB            On | 00000000:15:00.0 Off |                    0 |
| N/A   39C    P0               57W / 300W|  10655MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA T1000 8GB                On | 00000000:82:00.0 Off |                  N/A |
| 35%   33C    P8               N/A /  50W|      6MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   5  Tesla V100-SXM2-16GB            On | 00000000:8B:00.0 Off |                    0 |
| N/A   37C    P0               40W / 300W|      4MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   6  Tesla V100-SXM2-16GB            On | 00000000:8C:00.0 Off |                    0 |
| N/A   34C    P0               41W / 300W|      4MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   7  Tesla V100-SXM2-16GB            On | 00000000:8F:00.0 Off |                    0 |
| N/A   36C    P0               38W / 300W|      4MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   8  Tesla V100-SXM2-16GB            On | 00000000:90:00.0 Off |                    0 |
| N/A   37C    P0               40W / 300W|      4MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

I tried limiting the number of GPUs to 4 with a batch size of 64 and it seems to work? Output below;

Mon May 29 02:17:19 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-SXM2-16GB            On | 00000000:0D:00.0 Off |                    0 |
| N/A   43C    P0               75W / 300W|  11675MiB / 16384MiB |     18%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2-16GB            On | 00000000:0E:00.0 Off |                    0 |
| N/A   40C    P0               74W / 300W|   7899MiB / 16384MiB |     17%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2-16GB            On | 00000000:14:00.0 Off |                    0 |
| N/A   38C    P0               72W / 300W|   7263MiB / 16384MiB |     17%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2-16GB            On | 00000000:15:00.0 Off |                    0 |
| N/A   42C    P0               75W / 300W|   6923MiB / 16384MiB |     15%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA T1000 8GB                On | 00000000:82:00.0 Off |                  N/A |
| 35%   32C    P8               N/A /  50W|      6MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   5  Tesla V100-SXM2-16GB            On | 00000000:8B:00.0 Off |                    0 |
| N/A   37C    P0               40W / 300W|      4MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   6  Tesla V100-SXM2-16GB            On | 00000000:8C:00.0 Off |                    0 |
| N/A   34C    P0               41W / 300W|      4MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   7  Tesla V100-SXM2-16GB            On | 00000000:8F:00.0 Off |                    0 |
| N/A   36C    P0               38W / 300W|      4MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   8  Tesla V100-SXM2-16GB            On | 00000000:90:00.0 Off |                    0 |
| N/A   37C    P0               40W / 300W|      4MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   1428813      C   python                                    11668MiB |
|    1   N/A  N/A   1428813      C   python                                     7892MiB |
|    2   N/A  N/A   1428813      C   python                                     7256MiB |
|    3   N/A  N/A   1428813      C   python                                     6916MiB |
+---------------------------------------------------------------------------------------+

Appreciate any advice for getting this working.....

mmmmllll1 avatar May 29 '23 09:05 mmmmllll1