CenterNet icon indicating copy to clipboard operation
CenterNet copied to clipboard

RuntimeError: cuda runtime error (2) when training

Open tangtaogo opened this issue 5 years ago • 1 comments

when I run train.py use my dataset, CUDA error: out of memory , how to solve it. My GPU is Titan X ,the batch is 2,and chunk size is [2]. I don't know why, and need help

{'batch_size': 2, 'cache_dir': 'cache', 'chunk_sizes': [2], 'config_dir': 'config', 'data_dir': '/data//Detection/CenterNet/data', 'data_rng': <mtrand.RandomState object at 0x7f5cfbb834c8>, 'dataset': 'MSCOCO', 'decay_rate': 10, 'display': 5, 'learning_rate': 0.00025, 'max_iter': 50000, 'nnet_rng': <mtrand.RandomState object at 0x7f5cfbb83510>, 'opt_algo': 'adam', 'prefetch_size': 6, 'pretrain': None, 'result_dir': 'results', 'sampling_function': 'kp_detection', 'snapshot': 500, 'snapshot_name': 'CenterNet-52', 'stepsize': 500, 'test_split': 'testdev', 'train_split': 'trainval', 'val_iter': 500, 'val_split': 'minival', 'weight_decay': False, 'weight_decay_rate': 1e-05, 'weight_decay_type': 'l2'} db config... {'ae_threshold': 0.5, 'border': 128, 'categories': 3, 'data_aug': True, 'gaussian_bump': True, 'gaussian_iou': 0.7, 'gaussian_radius': -1, 'input_size': [511, 511], 'kp_categories': 1, 'lighting': True, 'max_per_image': 100, 'merge_bbox': False, 'nms_algorithm': 'exp_soft_nms', 'nms_kernel': 3, 'nms_threshold': 0.5, 'output_sizes': [[128, 128]], 'rand_color': True, 'rand_crop': True, 'rand_pushes': False, 'rand_samples': False, 'rand_scale_max': 1.2, 'rand_scale_min': 0.8, 'rand_scale_step': 0.1, 'rand_scales': array([0.8, 0.9, 1. , 1.1]), 'special_crop': False, 'test_scales': [1], 'top_k': 70, 'weight_exp': 8} len of db: 1440 start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... shuffling indices... building model... module_file: models.CenterNet-52 THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1532581333611/work/aten/src/THC/THCCachingHostAllocator.cpp line=271 error=2 : out of memory Exception in thread Thread-1: Traceback (most recent call last): File "/data//anaconda3/envs/CenterNet/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/data/******/anaconda3/envs/CenterNet/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "train.py", line 54, in pin_memory data["xs"] = [x.pin_memory() for x in data["xs"]] File "train.py", line 54, in data["xs"] = [x.pin_memory() for x in data["xs"]] RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1532581333611/work/aten/src/THC/THCCachingHostAllocator.cpp:271 .........

tangtaogo avatar Aug 06 '19 12:08 tangtaogo

I encounter the same problem as you. Have you solved it?

wangminj avatar Jun 16 '20 13:06 wangminj