CV-CUDA icon indicating copy to clipboard operation
CV-CUDA copied to clipboard

[FEATURE] Do you mind add an op supportted crop, resize and normalization, it's very useful in infer situation. But now I only find an op. for randomCropResize,

Open qihang720 opened this issue 1 year ago • 4 comments

In my situation, I obtain rectangles through YOLO, then crop the corresponding rectangles from the original image, finally do resize and normalization, and send the results to ResNet.

So I hope cvcuda can provide an op do those processes!

qihang720 avatar Sep 08 '23 05:09 qihang720

Can you provide more details?

is the ask here to have a crop that takes in a list of rects, a single input image, and an output multiple images that are the cropped rects?

pmikolajczyk avatar Sep 22 '23 22:09 pmikolajczyk

Can you provide more details?

is the ask here to have a crop that takes in a list of rects, a single input image, and an output multiple images that are the cropped rects?

HI there, " crop that takes in a list of rects, a single input image" This is indeed very important for us. Do you have plan to support this kind of Crop?

tp-nan avatar Dec 29 '23 07:12 tp-nan

@tp-nan - We will look into adding a single operators that does Crop-Resize-Normalize. Thanks for your feedback/request. Regarding the point on multiple cropped rectangles, do you think an operation like non-max suppression is useful in your case to pick the best 'bounding box' rectangle from YOLO and use that to crop the image? If so, we have an operators for NMS, please take a look at that. If that's not the case, please provide more details on the use case for multiple cropped images feeding into ResNet model.

shiremathNV avatar Jan 24 '24 19:01 shiremathNV

Hi @shiremathNV , Thanks for your reply.

If so, we have an operators for NMS, please take a look at that

That would be nice though, but I cannot find the code

provide more details on the use case for multiple cropped images feeding into ResNet model.

For example, in our text detection scenario, hundreds of horizontal and rotated bounding boxes are obtained through post-processing and NMS (via the CPU version). The cost of individually cropping or applying affine transformations to each of these text boxes is high, so we need to crop and resize multiple boxes at a few times.

As for Normalization, it is typically integrated into TensorRT model inference (text recognition model) and therefore we hope there would be an option to turn it off, or have a separate operator that does not include Normalization.

tp-nan avatar Jan 25 '24 01:01 tp-nan