FOTS.PyTorch
FOTS.PyTorch copied to clipboard
cuDNN error when train in gpu
Memory Usage:
CUDA: 7 Allocated: 108.9853515625 MB Cached: 113.375 MB
['img_589.jpg', 'img_244.jpg', 'img_983.jpg', 'img_16.jpg']
Traceback (most recent call last):
File "train.py", line 75, in <module>
main(config, args.resume)
File "train.py", line 50, in main
trainer.train()
File "/data/home/zjw/pythonFile/FOTS.PyTorch/FOTS/base/base_trainer.py", line 78, in train
result = self._train_epoch(epoch)
File "/data/home/zjw/pythonFile/FOTS.PyTorch/FOTS/trainer/trainer.py", line 71, in _train_epoch
pred_score_map, pred_geo_map, pred_recog, pred_boxes, pred_mapping, indices = self.model.forward(img, boxes, mapping)
File "/data/home/zjw/pythonFile/FOTS.PyTorch/FOTS/model/model.py", line 109, in forward
rois, lengths, indices = self.roirotate(feature_map, boxes[:, :8], mapping)
File "/data/home/zjw/anaconda3/envs/pytorch-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/data/home/zjw/pythonFile/FOTS.PyTorch/FOTS/model/modules/roi_rotate.py", line 94, in forward
grid = nn.functional.affine_grid(matrixes, images.size())
File "/data/home/zjw/anaconda3/envs/pytorch-env/lib/python3.7/site-packages/torch/nn/functional.py", line 2615, in affine_grid
return vision.affine_grid_generator(theta, size)
File "/data/home/zjw/anaconda3/envs/pytorch-env/lib/python3.7/site-packages/torch/nn/_functions/vision.py", line 10, in affine_grid_generator
ret = torch.cudnn_affine_grid_generator(theta, N, C, H, W)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
======================================================== My environment:Ubuntu14.04, cuda8.0, cudnn7.1,pytorch1.0 When I train in GPUs, the cuDNN error has occured. Can anyone help to fix it? Thanks a lot.