DALI Crop multiple images from one image

Hello!

I'm trying to prepare preprocessing pipeline for MMPose network - specifically for hrnet48.

Logic of preprocessing without DALI: For mmpose operation, an additional object detector is used. Object detections are then cropped from the main image. Then a batch is formed from these cropped out detections and fed into the hrnet48 network.

The point is that the batch depends on the number of people in the frame - it is always different, dynamic. That is why I read it from fn.external_source along with an image.

I try to crop a one big image with some count of persons on it using fn.crop operation with parameters crop_pos_x and crop_pos_y. It works fine for one person (when I use crop_pos_x=persons_list[0][0]/1920, crop_pos_y=persons_list[0][3]/1080 instead of crop_pos_x=persons_list[0]/1920, crop_pos_y=persons_list[3]/1080, for example), but how to pass list of coordinates to crop to cut multiple bboxes from image?

According DALI documentation it is possible, for example, crop_pos_x (float or TensorList of float, optional, default = 0.5). But list does not work for me.

So the question is - how to use fn.crop function to crop many images from original image? How to pass a list of coordinates into fn.crop?

Source code:

class ExternalInputIterator(object):
    def __init__(self, image, persons):
        self.image = image
        self.persons = persons

    def __iter__(self):
        return self

    def __next__(self):
        return self.image, self.persons

    def update(self, image, persons):
        self.image = image
        self.persons = persons

Code from loop over video:

global eii

# make batch image of (1, 1920, 1080, 3) shape
dali_image = np.expand_dims(image, axis=0).astype(np.uint8)

# make batch persons of (1, N, 5) shape, where N - count of persons detected on frame. 5 - x1,y1,x2,y2,confidence.
dali_persons = np.expand_dims(np.stack(persons_result, axis=0), axis=0).astype(np.float32)
eii.update(image=dali_image, persons=dali_persons)

batch_dali = dali_pipeline.run()
out_images = batch_dali[0].as_array()
cv2.imwrite(f"dali_images/img_{i}.jpg", out_images[0])
out_persons = batch_dali[1].as_array()

Error, when I try to pass list of coordinates into fn.crop:

dummy_img = np.random.randint(2, size=(1, 1920, 1080))
dummy_img = dummy_img.astype(np.uint8)
eii = ExternalInputIterator(image=dummy_img, persons=np.expand_dims(np.array([1, 2], dtype=np.float32), axis=0))

@pipeline_def(batch_size=1, num_threads=4, device_id=0)
def mmpose_preprocess_pipeline():
    """
    Prepares pipeline of operations.
    :param batch_size: size of maximum batch for new DALI model.
    :param num_threads: number of CPU threads to be used.
    :param device_id: ID of GPU.
    :return: preprocessed images.
    """
    crop = (256, 192)
    image, persons_list = fn.external_source(source=eii, num_outputs=2, dtype=[types.UINT8, types.FLOAT])
    cropped = fn.crop(image, crop_pos_x=persons_list[0]/1920, crop_pos_y=persons_list[3]/1080, crop=crop, out_of_bounds_policy="pad", device="cpu")
    return cropped, persons_list

def create_pipeline(batch_size: int = 256, num_threads: int = 4, device_id: int = 0):
    """
    Builds and serialize pipeline of operations to DALI model.
    :param model_func_obj: function object pf model pipeline.
    :param path_to_save: path to save DALI model.
    :param batch_size: size of maximum batch for new DALI model.
    :param num_threads: number of CPU threads to be used.
    :param device_id: ID of GPU.
    :return: NoReturn.
    """
    print('Building pipeline...')
    pipeline = mmpose_preprocess_pipeline(batch_size=batch_size, num_threads=num_threads, device_id=device_id)
    pipeline.build()
    print('Done!')

    return pipeline

Building pipeline...
Done!
/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Traceback (most recent call last):
  File "/home/user/mmpose/pe_solutions/mmpose/dali_demo.py", line 189, in <module>
    img_det, fps, person_count = process_img(pose_model, det_model, img, dali_pipeline=pipe, i=i)
  File "/home/user/mmpose/pe_solutions/mmpose/dali_demo.py", line 94, in process_img
    batch_dali = dali_pipeline.run()
  File "/usr/local/lib/python3.8/dist-packages/nvidia/dali/pipeline.py", line 922, in run
    return self.outputs()
  File "/usr/local/lib/python3.8/dist-packages/nvidia/dali/pipeline.py", line 821, in outputs
    return self._outputs()
  File "/usr/local/lib/python3.8/dist-packages/nvidia/dali/pipeline.py", line 905, in _outputs
    return self._pipe.Outputs()
RuntimeError: Critical error in pipeline:
Error when executing CPU operator TensorSubscript, instance name: "__TensorSubscript_4", encountered:
[/opt/dali/dali/operators/generic/slice/subscript.h:214] Index 3 is out of range for axis 0 of length 3
Detected while processing sample #0 of shape (3 x 5)
Stacktrace (11 entries):
[frame 0]: /usr/local/lib/python3.8/dist-packages/nvidia/dali/libdali_operators.so(+0x594bc2) [0x7fc06292cbc2]
[frame 1]: /usr/local/lib/python3.8/dist-packages/nvidia/dali/libdali_operators.so(+0xfa60bb) [0x7fc06333e0bb]
[frame 2]: /usr/local/lib/python3.8/dist-packages/nvidia/dali/libdali_operators.so(+0xfa6b57) [0x7fc06333eb57]
[frame 3]: /usr/local/lib/python3.8/dist-packages/nvidia/dali/libdali.so(void dali::Executor<dali::AOT_WS_Policy<dali::UniformQueuePolicy>, dali::UniformQueuePolicy>::RunHelper<dali::HostWorkspace>(dali::OpNode&, dali::HostWorkspace&)+0x7bd) [0x7fc08256d2cd]
[frame 4]: /usr/local/lib/python3.8/dist-packages/nvidia/dali/libdali.so(dali::Executor<dali::AOT_WS_Policy<dali::UniformQueuePolicy>, dali::UniformQueuePolicy>::RunCPUImpl()+0x218) [0x7fc0825720e8]
[frame 5]: /usr/local/lib/python3.8/dist-packages/nvidia/dali/libdali.so(dali::Executor<dali::AOT_WS_Policy<dali::UniformQueuePolicy>, dali::UniformQueuePolicy>::RunCPU()+0xe) [0x7fc082572bae]
[frame 6]: /usr/local/lib/python3.8/dist-packages/nvidia/dali/libdali.so(+0xb778d) [0x7fc08252978d]
[frame 7]: /usr/local/lib/python3.8/dist-packages/nvidia/dali/libdali.so(+0x12f8bc) [0x7fc0825a18bc]
[frame 8]: /usr/local/lib/python3.8/dist-packages/nvidia/dali/libdali.so(+0x712b6f) [0x7fc082b84b6f]
[frame 9]: /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7fc1f2198609]
[frame 10]: /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fc1f22d2133]

Current pipeline object is no longer valid.

Process finished with exit code 1

Mar 22 '23 14:03 Tpoc311

Hi @Tpoc311,

Thank you for reaching out. Currently, it is not possible to achieve it directly in DALI. The assumption is that one sample at the input can produce only one slice at the output. However thanks to the recently introduced conditional execution it is possible to provide something working similarly:

@experimental.pipeline_def(enable_conditionals=True)
def test_pipe():
    img = fn.external_source(name="image")
    pers = fn.external_source(name="person")
    anch = pers[:, 0:2]
    shape = pers[:, 2:4] - pers[:, 0:2]
    if fn.shapes(pers)[0] == 0:
        out = fn.reshape(img, src_dims=[-1, 0, 1, 2])
    elif fn.shapes(pers)[0] == 1:
        out = fn.slice(img, anch[0], shape[0])
        out = fn.reshape(out, src_dims=[-1, 0, 1, 2])
    elif fn.shapes(pers)[0] == 2:
        out1 = fn.slice(img, anch[0], shape[0])
        out2 = fn.slice(img, anch[1], shape[1])
        out = fn.stack(out1, out2)
    else: # 3 people
        out1 = fn.slice(img, anch[0], shape[0])
        out2 = fn.slice(img, anch[1], shape[1])
        out3 = fn.slice(img, anch[2], shape[2])
        out = fn.stack(out1, out2, out3)
    return out, anch, shape

So depending on numer of cropping windows you can call slice multiple times and then stack the results.

Mar 22 '23 17:03 JanuszL

Hi, @JanuszL!

Thank you for paying attention to the issue.

This method would be applicable for batch size 5 or 10. But potentially I will get up to 40 batch size. So the batch ranges from 1 to 40.

The ideal scheme for the operation of this pipeline: Put one original image and a list of coordinates for people's bboxes as input. And at the output, get cut out images of these people according to their bboxes.

For now I gonna use the next approach: I will put batch of pairs [source_image, coord_of_bbox] and get the same output - cut out images of these people according to their bboxes.

The obvious disadvantage of this approach is that I have to put N-1 extra source images into pipeline. If there is any other way to solve this issue, I would be very grateful.

Apr 11 '23 10:04 Tpoc311

Hi @Tpoc311,

The obvious disadvantage of this approach is that I have to put N-1 extra source images into pipeline. If there is any other way to solve this issue, I would be very grateful.

For now, I'm afraid there is no other way to handle this. I will add a multi-crop/slice operator to our ToDo list.

Apr 11 '23 10:04 JanuszL

Hi, @JanuszL,

That would be great! May I ask you for one more favor? Could you also add an fn.warp_affine operator to the task list? It can also be used to cut images. And transforms which prepare tranformation matrix, such as:

Thank you in advance!

Apr 13 '23 08:04 Tpoc311

Hi @Tpoc311,

Thank you for the suggestion. The goal of this section is to describe how to prepare a transformation matrix for the fn.warp_affine operator. Putting it here will mix two different kinds of operations.

Apr 13 '23 08:04 JanuszL

@JanuszL, thank you for explaining all this stuff! I have no more questions at the moment

Apr 13 '23 09:04 Tpoc311

I'm running into the same issue and would most likely be able to solve it with a new operator multi_crop_mirror_normalize (or if that's not possible, multi_crop or multi_slice).

Dec 13 '23 08:12 bveldhoen

@bveldhoen, Hi! Keep in touch pls, it would be interesting to know the result

Jan 04 '24 16:01 Tpoc311