GaitSet icon indicating copy to clipboard operation
GaitSet copied to clipboard

train.py 执行后无反应 RuntimeError: CuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Open miomiora opened this issue 1 year ago • 0 comments

Initialzing...
Initializing data source...
Data initialization complete.
Initializing model...
Model initialization complete.
Training START
Traceback (most recent call last):
  File "train.py", line 21, in <module>
    m.fit()
  File "/home/czk/code/GaitSet/model/model.py", line 159, in fit
    feature, label_prob = self.encoder(*seq, batch_frame)
  File "/home/czk/anaconda3/envs/GaitSet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/czk/anaconda3/envs/GaitSet/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/czk/anaconda3/envs/GaitSet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/czk/code/GaitSet/model/network/gaitset.py", line 90, in forward
    x = self.set_layer1(x)
  File "/home/czk/anaconda3/envs/GaitSet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/czk/code/GaitSet/model/network/basic_blocks.py", line 24, in forward
    x = self.forward_block(x.view(-1,c,h,w))
  File "/home/czk/anaconda3/envs/GaitSet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/czk/code/GaitSet/model/network/basic_blocks.py", line 11, in forward
    x = self.conv(x)
  File "/home/czk/anaconda3/envs/GaitSet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/czk/anaconda3/envs/GaitSet/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CuDNN error: CUDNN_STATUS_EXECUTION_FAILED

执行 python train.py 之后就无反应了,等很久之后会报这个错

pip list
Package         Version
--------------- ------------
certifi         2021.5.30
cffi            1.14.6
imageio         2.15.0
mkl-fft         1.0.6
mkl-random      1.0.1
numpy           1.15.4
opencv-python   4.1.2.30
pandas          1.1.5
Pillow          8.4.0
pip             21.2.2
pycparser       2.21
python-dateutil 2.8.2
pytz            2023.3.post1
scipy           1.5.4
setuptools      58.0.4
six             1.16.0
TBB             0.2
torch           0.4.1
wheel           0.37.1
xarray          0.16.2

执行 python train.py 过程中 GPU 只会有一点占用

miomiora avatar Dec 11 '23 07:12 miomiora