DALI decode image sequence

hi, now I want to run external source pipeline on image sequence, can you tell me how to decode sequence image, or can you show me a example? thanks

Feb 18 '22 07:02 sjbling

Hi @sjbling.

You can try something like this. The external source operator returns sequence_length outputs + one for the labels. Then they are decoded and stacked together to form a sequence. What is worth knowing is, internally DALI creates one decoder instance for each input it decodes, so in this case, under the hood, there are 4 decoder instances that lead to increased memory utilization.

import numpy as np
import nvidia.dali.fn as fn
from nvidia.dali import pipeline_def
import os

batch_size = 10
sequence_length = 4

test_data_root = os.environ['DALI_EXTRA_PATH']
jpeg_file = os.path.join(test_data_root, 'db', 'single', 'jpeg', '510', 'ship-1083562_640.jpg')

def get_data(sample_info):
    out = [np.fromfile(jpeg_file, dtype=np.uint8) for _ in range(sequence_length)]
    # add label
    out.append(np.array([1,2,3]))
    return out

@pipeline_def
def simple_pipeline():
    *jpegs, label = fn.external_source(source=get_data, num_outputs=sequence_length+1, parallel=True, batch=False)
    images = fn.decoders.image(jpegs, device="mixed", hw_decoder_load=1)
    sequence = fn.stack(*images)
    sequence = fn.reshape(sequence, layout="DHWC")
    return sequence, label

pipe = simple_pipeline(batch_size=batch_size, num_threads=4, prefetch_queue_depth=2, device_id=0)
pipe.build()
pipe.run()
out = pipe.run()
print(np.array(out[0][0].as_cpu()).shape)
print(np.array(out[1][0]))

`

Feb 18 '22 08:02 JanuszL

thanks a lot, but if I use the external source pipline, the pytorch source returns images sequence(c,t,h,w),where t == sequence_length, and label, idx, and extra_data, how to set the num_outputs, or are there any requirements for the iterator output of pytorch?

Feb 21 '22 06:02 sjbling

@sjbling I'm not sure I understand your problem correctly. You say that there's a CTWH tensor - doesn't it mean that your images are already decoded? Please describe your desired pipeline in more detail. For start, please specify:

what do you mean by "pytorch source"?
what are the inputs?
what are expected outputs?
what tools are you using to feed the external source?

Feb 21 '22 10:02 mzient

Thank you for your reminder， I followed the ExternalSource Operator tutorial in dali docs(https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/general/data_loading/parallel_external_source.html#Accepted-source)，the pytorch source is the callable object created in pytorch.
inputs is the data returned by the callback, it includes : the decodes images list shaped (C.T1,W,H) and (C,T2,W,H) ,the label array and extra metra . when I run the pipeline , the error occured "AttributeError: 'list' object has no attribute 'shape'", if I repalced the list, theres another error "RuntimeError: [/opt/dali/dali/python/backend_impl.cc:143] Assert on "strides[i] == stride_from_shape" failed: Strided data not supported.",

Feb 22 '22 01:02 sjbling

The error says that the ordering of axes doesn't reflect the layout in memory or that there's some extra padding there. It's very likely that the real data layout is TCHW or THWC (your T would be F - for frames - in DALI nomenclature). Can you print (and paste here) the shapes and strides of the tensors that you try to pass?

Feb 22 '22 09:02 mzient

the callback return a list ,the length of the list is 2, the lists is [array(C,F1,H,W),array(C,F2,H,W)], when I run pileline, the error occured "AttributeError: 'list' object has no attribute 'shape'".

Feb 23 '22 07:02 sjbling

Hmm... have you marked your external_source as having 2 outputs? This list looks like it could be consumed by a non-batched external source with 2 outputs:

data1, data2 = fn.external_source(your_callback, num_outputs=2, batch=False)

where your_callback produces data as shown in your last post.

I don't think I can help much much more without seeing the actual code. Can you post a minimal reproduction of your problem?

Feb 23 '22 11:02 mzient

def call(self, sample_info): idx = sample_info.idx_in_epoch """ Generate corresponding clips, boxes, labels and metadata for given idx.

Args:
    idx (int): the video index provided by the pytorch sampler.
Returns:
    frames (tensor): the frames of sampled from the video. The dimension
        is `channel` x `num frames` x `height` x `width`.
    label (ndarray): the label for correspond boxes for the current video.
    idx (int): the video index provided by the pytorch sampler.
    extra_data (dict): a dict containing extra data fields, like "boxes",
        "ori_boxes" and "metadata".
"""
if sample_info.iteration >= self.num_videos:
    # Indicate end of the epoch
    raise StopIteration()


video_idx, sec_idx, sec, center_idx = self._keyframe_indices[idx]
# Get the frame idxs for current clip.
seq = utils.get_sequence(
    center_idx,
    self._seq_len // 2,
    self._sample_rate,
    num_frames=len(self._image_paths[video_idx]),
)

clip_label_list = self._keyframe_boxes_and_labels[video_idx][sec_idx]
assert len(clip_label_list) > 0

# Get boxes and labels for current clip.
boxes = []
labels = []
for box_labels in clip_label_list:
    boxes.append(box_labels[0])
    labels.append(box_labels[1])
boxes = np.array(boxes)
# Score is not used.
boxes = boxes[:, :4].copy()
ori_boxes = boxes.copy()

image_paths = [self._image_paths[video_idx][frame] for frame in seq]
imgs = utils.retry_load_images(
        image_paths, self.cfg, backend=self.cfg.AVA.IMG_PROC_BACKEND
    )

fake_1_paths = [image_path.replace("frames", "fake_1_maps") for image_path in image_paths]
fake_1_maps = utils.retry_load_fake_1_maps(
        fake_1_paths, self.cfg, backend=self.cfg.AVA.IMG_PROC_BACKEND
    )

fake_2_paths = [image_path.replace("frames", "fake_2_mask") for image_path in image_paths]
fake_2_paths = [fake_2_path.replace("jpg", "png") for fake_2_path in fake_2_paths]
fake_2_maps = utils.retry_load_fake_1_maps(
    fake_2_paths, self.cfg, backend=self.cfg.AVA.IMG_PROC_BACKEND
)


# Preprocess images and boxes
imgs, fake_1_maps, fake_2_maps, boxes = self._images_and_boxes_preprocessing_cv2(
        imgs, fake_1_maps, fake_2_maps, boxes=boxes
)

# Construct label arrays.
label_arrs = np.zeros((len(labels), self._num_classes), dtype=np.int32)
for i, box_labels in enumerate(labels):
    for label in box_labels:
        if label == -1:
            continue
        assert label >= 1 and label <= self.cfg.MODEL.NUM_CLASSES
        label_arrs[i][label - 1] = 1

imgs = utils.pack_pathway_output(self.cfg, imgs)
fake_1_maps = utils.pack_pathway_output(self.cfg, fake_1_maps)
fake_2_maps = utils.pack_pathway_output(self.cfg, fake_2_maps)
metadata = [[video_idx, sec]] * len(boxes)

extra_data = {
    "boxes": boxes,
    "ori_boxes": ori_boxes,
    "metadata": metadata,
}

inputs = []
if isinstance(imgs, (list,)):
    for i in range(len(imgs)):
        temp_tensor_x = torch.cat((imgs[i], fake_1_maps[i]), 0)
        temp_tensor = torch.cat((temp_tensor_x, fake_2_maps[i]), 0)
        temp_tensor = temp_tensor.float()
        inputs.append(temp_tensor)
else:
    temp_tensor_x = torch.cat((imgs, fake_1_maps), 0)
    temp_tensor = torch.cat((temp_tensor_x, fake_2_maps), 0)
    inputs = temp_tensor
inputs[0] = inputs[0].cpu().numpy()
inputs[1] = inputs[1].cpu().numpy()

if self.with_fake_1_mask == False:
    return imgs, label_arrs, idx, extra_data
else:
    return inputs[0], label_arrs, np.array(idx)

this is my callback, the return input is a list, and I use the follwing code to run:
jpegs, labels, idx = fn.external_source(source=external_data, num_outputs=3, batch=False, device="gpu")
your suggestion worked, but the callback return can be a list, right? Why is there such an error?("AttributeError: 'list' object has no attribute 'shape'")

Feb 24 '22 02:02 sjbling

@sjbling Sorry for late reply.

but the callback return can be a list, right? W

No, an external source callback must produce something that implements Python buffer protocol (numpy array, torch tensor) or CUDA array interface (CuPy array, Torch GPU tensor, etc).

Feb 25 '22 17:02 mzient

DALI DALI copied to clipboard

decode image sequence

DALI
DALI copied to clipboard