CenterNet icon indicating copy to clipboard operation
CenterNet copied to clipboard

error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device

Open knotgrass opened this issue 3 years ago • 3 comments

i ran train in google colab and i have a Issues error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device

input: %cd /content/gdrive/MyDrive/CenterNet/src !python main.py ctdet --exp_id ocvit --batch_size 16 --lr 1.25e-4 --gpus 0

output: Fix size testing. training chunk_sizes: [16] The output will be saved to /content/gdrive/MyDrive/CenterNet/src/lib/../../exp/ctdet/ocvit heads {'hm': 2, 'wh': 2, 'reg': 2} Namespace(K=100, aggr_weight=0.0, agnostic_ex=False, arch='dla_34', aug_ddd=0.5, aug_rot=0, batch_size=16, cat_spec_wh=False, center_thresh=0.1, chunk_sizes=[16], data_dir='/content/gdrive/MyDrive/CenterNet/src/lib/../../data', dataset='ocvit', debug=0, debug_dir='/content/gdrive/MyDrive/CenterNet/src/lib/../../exp/ctdet/ocvit/debug', debugger_theme='white', demo='', dense_hp=False, dense_wh=False, dep_weight=1, dim_weight=1, down_ratio=4, eval_oracle_dep=False, eval_oracle_hm=False, eval_oracle_hmhp=False, eval_oracle_hp_offset=False, eval_oracle_kps=False, eval_oracle_offset=False, eval_oracle_wh=False, exp_dir='/content/gdrive/MyDrive/CenterNet/src/lib/../../exp/ctdet', exp_id='ocvit', fix_res=True, flip=0.5, flip_test=False, gpus=[0], gpus_str='0', head_conv=256, heads={'hm': 2, 'wh': 2, 'reg': 2}, hide_data_time=False, hm_hp=True, hm_hp_weight=1, hm_weight=1, hp_weight=1, input_h=512, input_res=512, input_w=512, keep_res=False, kitti_split='3dop', load_model='', lr=0.000125, lr_step=[90, 120], master_batch_size=16, mean=array([[[0.472459, 0.47508 , 0.482652]]], dtype=float32), metric='loss', mse_loss=False, nms=False, no_color_aug=False, norm_wh=False, not_cuda_benchmark=False, not_hm_hp=False, not_prefetch_test=False, not_rand_crop=False, not_reg_bbox=False, not_reg_hp_offset=False, not_reg_offset=False, num_classes=2, num_epochs=140, num_iters=-1, num_stacks=1, num_workers=4, off_weight=1, output_h=128, output_res=128, output_w=128, pad=31, peak_thresh=0.2, print_iter=0, rect_mask=False, reg_bbox=True, reg_hp_offset=True, reg_loss='l1', reg_offset=True, resume=False, root_dir='/content/gdrive/MyDrive/CenterNet/src/lib/../..', rot_weight=1, rotate=0, save_all=False, save_dir='/content/gdrive/MyDrive/CenterNet/src/lib/../../exp/ctdet/ocvit', scale=0.4, scores_thresh=0.1, seed=317, shift=0.1, std=array([[[0.255084, 0.254665, 0.257073]]], dtype=float32), task='ctdet', test=False, test_scales=[1.0], trainval=False, val_intervals=5, vis_thresh=0.3, wh_weight=0.1) Creating model... Setting up data... ==> initializing Ocvit val data. loading annotations into memory... Done (t=0.00s) creating index... index created! Loaded val 5 samples ==> initializing Ocvit train data. loading annotations into memory... Done (t=0.01s) creating index... index created! Loaded train 1462 samples Starting training... ctdet/ocviterror in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device Traceback (most recent call last): File "main.py", line 102, in main(opt) File "main.py", line 70, in main log_dict_train, _ = trainer.train(epoch, train_loader) File "/content/gdrive/MyDrive/CenterNet/src/lib/trains/base_trainer.py", line 119, in train return self.run_epoch('train', epoch, data_loader) File "/content/gdrive/MyDrive/CenterNet/src/lib/trains/base_trainer.py", line 69, in run_epoch output, loss, loss_stats = model_with_loss(batch) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/content/gdrive/MyDrive/CenterNet/src/lib/trains/base_trainer.py", line 19, in forward outputs = self.model(batch['input']) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/content/gdrive/MyDrive/CenterNet/src/lib/models/networks/pose_dla_dcn.py", line 472, in forward x = self.dla_up(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/content/gdrive/MyDrive/CenterNet/src/lib/models/networks/pose_dla_dcn.py", line 411, in forward ida(layers, len(layers) -i - 2, len(layers)) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/content/gdrive/MyDrive/CenterNet/src/lib/models/networks/pose_dla_dcn.py", line 384, in forward layers[i] = upsample(project(layers[i])) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 778, in forward output_padding, self.groups, self.dilation) RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

any one meet this problem help me

knotgrass avatar May 19 '21 14:05 knotgrass

i ran in google colab too and i have a Issues too

input: %cd /content/CenterNet/src/lib/models/networks/DCNv2 !./make.sh

output: /content/CenterNet/src/lib/models/networks/DCNv2 /content/CenterNet/src/lib/models/networks/DCNv2 Traceback (most recent call last): File "build.py", line 21, in raise ValueError('CUDA is not available') ValueError: CUDA is not available Traceback (most recent call last): File "build_double.py", line 21, in raise ValueError('CUDA is not available') ValueError: CUDA is not available

Can you solve it for me?Thank you.

ifanshida avatar Dec 28 '21 07:12 ifanshida

Same issue here

loaded models/ctdet_coco_dla_2x.pth, epoch 230 error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THC/THCCachingHostAllocator.cpp line=265 error=77 : an illegal memory access was encountered Traceback (most recent call last):

natwille1 avatar Jan 25 '23 12:01 natwille1

Same issue here

loaded models/ctdet_coco_dla_2x.pth, epoch 230 error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THC/THCCachingHostAllocator.cpp line=265 error=77 : an illegal memory access was encountered Traceback (most recent call last): do you sovle it? same question

yuan243212790 avatar Feb 12 '23 07:02 yuan243212790