mmdetection [Bug] when the number of instances in an image are more than 512, it throws an error: src.cols > 0 && src.rows

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] I have read the FAQ documentation but cannot get the expected help.
[X] The bug has not been fixed in the latest version (master) or latest version (3.x).

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmdetection

Environment

Python: 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] CUDA available: True GPU 0,1,2,3,4,5,6,7: NVIDIA A100-PCIE-40GB CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.1.TC455_06.29069683_0 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 PyTorch: 1.7.1 PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.0
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_37,code=compute_37
CuDNN 8.0.5
Magma 2.5.2
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.8.2 OpenCV: 4.5.4-dev MMCV: 1.3.16 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.0 MMDetection: 2.18.0+db256a1

Reproduces the problem - code sample

I use the following config to build Lvis dataset and do something (iterate to extract the bbox and masks from the annotation)

import numpy as np
import cv2
import json
from tqdm import tqdm
from mmdet.datasets import build_dataset

debug = False

if __name__ == "__main__":

    dataset_type = 'LVISV1Dataset'
    data_root = 'xxx'
    train_pipeline = [
        dict(type='LoadDummyImageFromFile'),
        dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
        dict(type='RandomFlip', flip_ratio=0.5),    
        dict(type='Resize', img_scale=(512, 512), keep_ratio=True),
        dict(type='Pad', size_divisor=512),
        dict(type='FixedShift', fixed_size=512),
    ]

    trainset=dict(
        type=dataset_type,
        ann_file='lvis_v1_train.json',
        img_prefix=data_root,
        pipeline=train_pipeline)

    lvis_dataset = build_dataset(trainset)

However, when it iterates to 2000+ images, it throws an error:

  File "mmdetection/mmdet/datasets/custom.py", line 220, in __getitem__
    data = self.prepare_train_img(idx)
  File "mmdetection/mmdet/datasets/custom.py", line 243, in prepare_train_img
    return self.pipeline(results)
  File "mmdetection/mmdet/datasets/pipelines/compose.py", line 41, in __call__
    data = t(data)
  File "mmdetection/mmdet/datasets/pipelines/transforms.py", line 575, in __call__
    results[key] = results[key].translate(
  File "mmdetection/mmdet/core/mask/structures.py", line 420, in translate
    translated_masks = mmcv.imtranslate(
  File "/data1/opt/.cache/matlab/anaconda3/lib/python3.8/site-packages/mmcv/image/geometric.py", line 719, in imtranslate
    translated = cv2.warpAffine(
cv2.error: OpenCV(4.5.4-dev) /tmp/pip-req-build-h45n7_hz/opencv/modules/imgproc/src/imgwarp.cpp:2595: error: (-215:Assertion failed) src.cols > 0 && src.rows > 0 in function 'warpAffine

it was in FixedShift function. Here is the code, just modified from RandomShift.


@PIPELINES.register_module()
class FixedShift:
    """Shift the image and box given shift pixels and probability.

    Args:
        shift_ratio (float): Probability of shifts. Default 0.5.
        max_shift_px (int): The max pixels for shifting. Default 32.
        filter_thr_px (int): The width and height threshold for filtering.
            The bbox and the rest of the targets below the width and
            height threshold will be filtered. Default 1.
    """

    def __init__(self, shift_ratio=1, fixed_size=512, max_shift_px=32, filter_thr_px=1):
        assert 0 <= shift_ratio <= 1
        assert max_shift_px >= 0
        self.shift_ratio = shift_ratio
        self.max_shift_px = max_shift_px
        self.filter_thr_px = int(filter_thr_px)
        self.fixed_size = fixed_size
        # The key correspondence from bboxes to labels.
        self.bbox2label = {
            'gt_bboxes': 'gt_labels',
            'gt_bboxes_ignore': 'gt_labels_ignore'
        }

    def __call__(self, results):
        """Call function to random shift images, bounding boxes.

        Args:
            results (dict): Result dict from loading pipeline.

        Returns:
            dict: Shift results.
        """
        # if random.random() < self.shift_ratio:
        # import ipdb; ipdb.set_trace()
        ori_img_h, ori_img_w = results['img_shape'][:2]        
        trans_x = 0
        trans_y = 0
        if ori_img_h > ori_img_w:
            trans_x = (self.fixed_size - ori_img_w) // 2
        else:
            trans_y = (self.fixed_size - ori_img_h) // 2


        img_shape = results['img'].shape[:2]

        random_shift_x = trans_x
        random_shift_y = trans_y
        new_x = max(0, random_shift_x)
        orig_x = max(0, -random_shift_x)
        new_y = max(0, random_shift_y)
        orig_y = max(0, -random_shift_y)

        # TODO: support mask and semantic segmentation maps.
        for key in results.get('bbox_fields', []):
            bboxes = results[key].copy()
            bboxes[..., 0::2] += random_shift_x
            bboxes[..., 1::2] += random_shift_y

            # clip border
            bboxes[..., 0::2] = np.clip(bboxes[..., 0::2], 0, img_shape[1])
            bboxes[..., 1::2] = np.clip(bboxes[..., 1::2], 0, img_shape[0])

            # remove invalid bboxes
            bbox_w = bboxes[..., 2] - bboxes[..., 0]
            bbox_h = bboxes[..., 3] - bboxes[..., 1]
            valid_inds = (bbox_w > self.filter_thr_px) & (
                bbox_h > self.filter_thr_px)
            # If the shift does not contain any gt-bbox area, skip this
            # image.
            if key == 'gt_bboxes' and not valid_inds.any():
                return results
            bboxes = bboxes[valid_inds]
            results[key] = bboxes

            # label fields. e.g. gt_labels and gt_labels_ignore
            label_key = self.bbox2label.get(key)
            if label_key in results:
                results[label_key] = results[label_key][valid_inds]

        for key in results.get('img_fields', ['img']):
            img = results[key]
            new_img = np.zeros_like(img)
            img_h, img_w = img.shape[:2]
            new_h = img_h - np.abs(random_shift_y)
            new_w = img_w - np.abs(random_shift_x)
            new_img[new_y:new_y + new_h, new_x:new_x + new_w] \
                = img[orig_y:orig_y + new_h, orig_x:orig_x + new_w]
            results[key] = new_img

        for key in results.get('mask_fields', []):  # gt_masks
            if results[key] is None:
                continue            
                
            if ori_img_h > ori_img_w:
                results[key] = results[key].translate(
                                            out_shape=results['pad_shape'][:2],
                                            offset=trans_x,
                                            direction='horizontal',)
            else:
                results[key] = results[key].translate(
                                            out_shape=results['pad_shape'][:2],
                                            offset=trans_y,
                                            direction='vertical',)
        return results

    def __repr__(self):
        repr_str = self.__class__.__name__
        repr_str += f'(max_shift_px={self.max_shift_px}, '
        return repr_str

After debugging, I found the channel of the BitmapMasks of that image is 556, which is too large. meaning the instances of that image is 556. So I write a script and found the maximum channel can be handled for warpAffine function is 512, images whose channel is larger than 512 will canse an error.

Reproduces the problem - command or script


import numpy as np
import cv2

img = np.ones((1024,1024,1024))
translate_matrix = np.float32([[1, 0, 0], [0, 1, 0]])
translated = cv2.warpAffine(
    img,
    translate_matrix,
    (1024,1024),)
print(translated.shape)

Reproduces the problem - error message

  File "mmdetection/mmdet/datasets/custom.py", line 220, in __getitem__
    data = self.prepare_train_img(idx)
  File "mmdetection/mmdet/datasets/custom.py", line 243, in prepare_train_img
    return self.pipeline(results)
  File "mmdetection/mmdet/datasets/pipelines/compose.py", line 41, in __call__
    data = t(data)
  File "mmdetection/mmdet/datasets/pipelines/transforms.py", line 575, in __call__
    results[key] = results[key].translate(
  File "mmdetection/mmdet/core/mask/structures.py", line 420, in translate
    translated_masks = mmcv.imtranslate(
  File "/data1/opt/.cache/matlab/anaconda3/lib/python3.8/site-packages/mmcv/image/geometric.py", line 719, in imtranslate
    translated = cv2.warpAffine(
cv2.error: OpenCV(4.5.4-dev) /tmp/pip-req-build-h45n7_hz/opencv/modules/imgproc/src/imgwarp.cpp:2595: error: (-215:Assertion failed) src.cols > 0 && src.rows > 0 in function 'warpAffine

Additional information

I use opencv-python==4.5.4-dev. I think it's a bug of OpenCV, but in mmcv, how can we solve it? Looking forward to your reply.

Mar 20 '23 07:03 yarkable

I met the same problem.

Jun 26 '23 11:06 pd162

I was also troubled by this issue for a long time today. The root cause I discovered is as follows:

First, navigate to site-packages/mmcv/image/geometric.py and find the imrotate function. Then, check if the number of channels of the input img exceeds CV_CN_MAX (512). If it does, the error you encountered will occur. I hope this information is helpful.

Apr 05 '25 18:04 Jason-2021

[Bug] when the number of instances in an image are more than 512, it throws an error: src.cols > 0 && src.rows > 0 in function 'warpAffine'

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information