HRNet-Semantic-Segmentation
HRNet-Semantic-Segmentation copied to clipboard
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:663
Tried to train the LIP dataset using a single GPU Changed the config files like below:
GPUS: (0,)
WORKERS: 1
BATCH_SIZE_PER_GPU: 1
While running python3 tools/train.py --cfg experiments/lip/seg_hrnet_w48_473x473_sgd_lr7e-3_wd5e-4_bs_40_epoch150.yaml
Got
/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py:118: UserWarning:
!! WARNING !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 4.9 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.
See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 4.9 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! WARNING !!
warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py:118: UserWarning:
!! WARNING !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 4.9 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.
See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 4.9 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! WARNING !!
warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
=> creating output/lip/seg_hrnet_w48_473x473_sgd_lr7e-3_wd5e-4_bs_40_epoch150
=> creating log/lip/seg_hrnet/seg_hrnet_w48_473x473_sgd_lr7e-3_wd5e-4_bs_40_epoch150_2020-04-07-14-05
Namespace(cfg='experiments/lip/seg_hrnet_w48_473x473_sgd_lr7e-3_wd5e-4_bs_40_epoch150.yaml', opts=[])
AUTO_RESUME: False
CUDNN:
BENCHMARK: True
DETERMINISTIC: False
ENABLED: True
DATASET:
DATASET: lip
EXTRA_TRAIN_SET:
NUM_CLASSES: 20
ROOT: data/
TEST_SET: list/lip/valList.txt
TRAIN_SET: list/lip/trainList.txt
DEBUG:
DEBUG: False
SAVE_BATCH_IMAGES_GT: False
SAVE_BATCH_IMAGES_PRED: False
SAVE_HEATMAPS_GT: False
SAVE_HEATMAPS_PRED: False
GPUS: (0,)
LOG_DIR: log
LOSS:
CLASS_BALANCE: True
OHEMKEEP: 131072
OHEMTHRES: 0.9
USE_OHEM: False
MODEL:
EXTRA:
FINAL_CONV_KERNEL: 1
STAGE1:
BLOCK: BOTTLENECK
FUSE_METHOD: SUM
NUM_BLOCKS: [4]
NUM_CHANNELS: [64]
NUM_MODULES: 1
NUM_RANCHES: 1
STAGE2:
BLOCK: BASIC
FUSE_METHOD: SUM
NUM_BLOCKS: [4, 4]
NUM_BRANCHES: 2
NUM_CHANNELS: [48, 96]
NUM_MODULES: 1
STAGE3:
BLOCK: BASIC
FUSE_METHOD: SUM
NUM_BLOCKS: [4, 4, 4]
NUM_BRANCHES: 3
NUM_CHANNELS: [48, 96, 192]
NUM_MODULES: 4
STAGE4:
BLOCK: BASIC
FUSE_METHOD: SUM
NUM_BLOCKS: [4, 4, 4, 4]
NUM_BRANCHES: 4
NUM_CHANNELS: [48, 96, 192, 384]
NUM_MODULES: 3
NAME: seg_hrnet
PRETRAINED: pretrained_models/hrnetv2_w48_imagenet_pretrained.pth
OUTPUT_DIR: output
PIN_MEMORY: True
PRINT_FREQ: 100
RANK: 0
TEST:
BASE_SIZE: 473
BATCH_SIZE_PER_GPU: 16
FLIP_TEST: False
IMAGE_SIZE: [473, 473]
MODEL_FILE:
MULTI_SCALE: False
NUM_SAMPLES: 2000
SCALE_LIST: [1]
TRAIN:
BASE_SIZE: 473
BATCH_SIZE_PER_GPU: 1
BEGIN_EPOCH: 0
DOWNSAMPLERATE: 1
END_EPOCH: 150
EXTRA_EPOCH: 0
EXTRA_LR: 0.001
FLIP: True
IGNORE_LABEL: 255
IMAGE_SIZE: [473, 473]
LR: 0.007
LR_FACTOR: 0.1
LR_STEP: [90, 110]
MOMENTUM: 0.9
MULTI_SCALE: True
NESTEROV: False
NUM_SAMPLES: 0
OPTIMIZER: sgd
RESUME: True
SCALE_FACTOR: 11
SHUFFLE: True
WD: 0.0005
WORKERS: 1
=> init weights from normal distribution
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument
Traceback (most recent call last):
File "tools/train.py", line 243, in <module>
main()
File "tools/train.py", line 81, in main
logger.info(get_model_summary(model.cuda(), dump_input.cuda()))
File "/home/ssdemo/tools/../lib/utils/modelsummary.py", line 90, in get_model_summary
model(*input_tensors)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/home/ssdemo/tools/../lib/models/seg_hrnet.py", line 414, in forward
x = self.conv1(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:663
Hi, thank you very much for your great work!
If supported as follows, I was able to operate with one GPU.
1. I used 'pytorch-V1.1' branch, not use 'master'
2. I used 'pytorch/pytorch:1.3-cuda10.1-cudnn7-runtime' dokcer
3. I made the adjustment described in #94 (https://github.com/HRNet/HRNet-Semantic-Segmentation/issues/94#issuecomment-612194299)
This is just a short note to inform you.
I have meet the same question, have you solve it? Thanks