InternImage icon indicating copy to clipboard operation
InternImage copied to clipboard

[DCNv4 ERROR] cuda op error when use internimage-L and internimage-LX with DCNv4, however internimage-B works well with DCNv4

Open dou3516 opened this issue 1 year ago • 5 comments

cuda op error when use internimage-L and internimage-LX with DCNv4, however internimage-B works well with DCNv4. What is wrong?

Environments: DCNv4: build from https://github.com/OpenGVLab/DCNv4/tree/main/DCNv4_op/make.sh DCNv3: build from https://github.com/OpenGVLab/InternImage/tree/master/segmentation/ops_dcnv3/make.sh

internimage-L config:

    backbone=dict(
        _delete_=True,
        type='InternImage',
        core_op='DCNv3',
        channels=160,
        depths=[5, 5, 22, 5],
        groups=[10, 20, 40, 80],
        mlp_ratio=4.,
        drop_path_rate=0.5, 
        norm_layer='LN',
        layer_scale=1.0,
        offset_scale=2.0,
        post_norm=True,
        with_cp=False,
        out_indices=(0, 1, 2, 3),
        dcn_output_bias=True,  # dcnv4
        mlp_fc2_bias=True,  # dcnv4
        dw_kernel_size=3,  # dcnv4
        use_dcn_v4_op=use_dcn_v4_op,  # dcnv4
        init_cfg=dict(type='Pretrained', checkpoint=pretrained)),

error log:

error in dcnv4_im2col_cuda: invalid configuration argument
launch arguments: gridDim=(1568, 1, 1), blockDim=(16, 80, 1), shm_size=5760
...
...
  File "/home/miniconda3/envs/dcnv4/lib/python3.9/site-packages/DCNv4-1.0.0.post2-py3.9-linux-x86_64.egg/DCNv4/functions/dcnv4_func.py", line 125, in backward
    ext.dcnv4_backward(*args)
RuntimeError: falseINTERNAL ASSERT FAILED at "/home/dbc/AIcode/DL/SS/mmsegmentation-dev1.x/DCNv4_op/src/cuda/dcnv4_col2im_cuda.cuh":470, please report a bug to PyTorch. kernel launch error

dou3516 avatar Feb 01 '24 03:02 dou3516

Hi, what the shape of your input tensor? Since DCNv4 utilizes share memory to store tensors, tensors with extremely large shape will cause errors.

zhiqi-li avatar Mar 11 '24 09:03 zhiqi-li

Hi, what the shape of your input tensor? Since DCNv4 utilizes share memory to store tensors, tensors with extremely large shape will cause errors.

B x C x H x W = 8 x 3 x 448 x 448

dou3516 avatar Mar 26 '24 07:03 dou3516

Hello, I have the same question, Have you solved it?

yan-hao-tian avatar Aug 27 '24 07:08 yan-hao-tian