DALI
DALI copied to clipboard
Get the crop size and position in RandomResizedCrop
Hi. Is there any way to get the crop size (crop_x, crop_y
) and position (crop_pos_x, crop_pos_y
) along with the image from RandomResizedCrop?
If the answer is no. I think I need to randomly generate these parameters crop_x, crop_y, crop_pos_x, crop_pos_y
in the define_graph
and pass them to Crop operator, which seems cumbersome because there is a for loop in a similar function get_params
in torchvision RandomResizedCrop. It seems that for loop (and if condition) would be ignored in the graph of DALI.
Currently, there's no way to get the crop anchor and shape from RandomResizedCrop.
Could you explain why do you need the crop parameters after the image has been cropped already? Some explanation would help us to suggest an alternative solution or to even prioritize this feature if we consider it's necessary. It should be fairly simple to extend the operator to produce those on demand.
Regarding loops, you are right. They wouldn't work inside the graph definition.
Hi, thanks for the quick response. I need the crop anchor and shape because I need to use them in my algorithms. Previously I used torchvision to do that, but torchvision is too slow. So I am switching to DALI for data augmentation. Since my research project is still on-going, I could not provide furthur details until it is published. Could you provide any hint on where I should modify the source code of DALI in order to get the anchor and shape? And then I would compile the modified DALI on my machine.
You need to edit https://github.com/NVIDIA/DALI/blob/master/dali/operators/image/resize/random_resized_crop.h https://github.com/NVIDIA/DALI/blob/master/dali/operators/image/resize/random_resized_crop.cc https://github.com/NVIDIA/DALI/blob/master/dali/operators/image/resize/random_resized_crop.cu
The operator already has a crops_
member with the data you need, you just have to create two additional outputs to the operator and copy the information there.
You can get some inspiration from https://github.com/NVIDIA/DALI/blob/master/dali/operators/image/crop/bbox_crop.cc (lines 528-540) which already outputs crop anchor and shape.
@jantonguirao Thank you very much! I will have a try.
Hi. Since I have another question that is also related to getting output from DALI, I post it here. I want to get a unique id for each image in the dataset. I have checked the file_reader_op.h
: https://github.com/NVIDIA/DALI/blob/master/dali/operators/reader/file_reader_op.h
It seems to me that the idx
in line 35 is the unique id of each image I want to get. Is that correct? So I think I can output the id by adding the following lines in file_reader_op.h
, just like how the code outputs labels:
const int idx = ws.data_idx();
auto &id_output = ws.Output<CPUBackend>(2);
id_output.Resize({1});
id_output.mutable_data<int>()[0] = id;
Is my implementation correct?
It seems to me that the
idx
in line 35 is the unique id of each image I want to get. Is that correct?
That is not correct. This is just the index of the sample within the batch.
If you are not using the labels, you could create one label per image and that would be your unique identifier. If that's not possible, you'll have to extend FileReader/FileLoader to produce unique ids.
Hi @jantonguirao,
Could you explain why do you need the crop parameters after the image has been cropped already? Some explanation would help us to suggest an alternative solution or to even prioritize this feature if we consider it's necessary. It should be fairly simple to extend the operator to produce those on demand.
For example in SimCLR (https://arxiv.org/pdf/2002.05709.pdf), they generate 2 random crops which end up either in case (a) or case (b)
However, if I only want case (a), by first producing view B, then compute a candidate for view A, I would first need to know the parameter of view B.
One solution could be derived from the code implementation in torchvision but I guess it would be significantly slower than the current version.
I also would like to generate A based on the original image I and not based on the cropped resized version of B i.e. not something similar to:
crop_B = fn.random_resized_crop(images, device="gpu", size=crop_size, random_area = [0.02, 1.0], random_aspect_ratio = [3/4, 4/3],)
crop_A = fn.random_resized_crop(crop_B, device="gpu", size=crop_size, random_area = [0.02, 1.0], random_aspect_ratio = [3/4, 4/3],)
I guess that we could support a use case like that by adding a basic operators that would expose the random crop region without cropping the image. The cropping window can then be used as ROI for the resize operator
# Extracts the image shape
image_shapes = fn.peek_image_shape(...) # something like [800, 600, 3]
# Decode images
images = fn.decoders.image(...)
# The new operator. Generates a random anchor and shape, given the maximum dimensions of the region
start_B, shape_B = fn.random_crop_generator(image_shape[:2], random_area = [0.02, 1.0], random_aspect_ratio = [3/4, 4/3])
# For the A region, we use the shape of B as the maximum dimensions
rel_start_A, shape_A = fn.random_crop_generator(shape_B, random_area = [0.02, 1.0], random_aspect_ratio = [3/4, 4/3])
# Calculate the absolute start value for A by adding the A offset to the start of B
start_A = start_B + rel_start_A
# Calculate end from start and shape
end_A = start_A + shape_A
end_B = start_B + shape_B
# Resize from a region of interest
B = fn.resize(images, size=crop_size, roi_start=start_B, roi_end=end_B)
A = fn.resize(images, size=crop_size, roi_start=start_A, roi_end=end_A)
random_crop_generator
can be easily implemented as the crop generation logic is already abstracted away in class RandomCropAttr
. We just need to write a little bit of code to expose an operator. If you are willing to contribute to DALI, I can guide you through the necessary steps. Otherwise we can implement it but we can't promise any particular timeline for this.
@jantonguirao this looks like the code I've made with torchvision
.
I'm always happy to contribute, but my cpp
skills are quite low (I can get some help).
@rvandeghen This PR enables the feature that you need (fn.random_crop_generator
): https://github.com/NVIDIA/DALI/pull/5304
You could then do:
encoded, _ = fn.readers.file(file_root=images_dir)
shapes = fn.peek_image_shape(encoded)
images = fn.decoders.image(encoded)
start_B, shape_B = fn.random_crop_generator(shapes, seed=seed0)
end_B = start_B + shape_B
rel_start_A, shape_A = fn.random_crop_generator(shape_B, seed=seed1)
start_A= start_B + rel_start_A
end_A = start_A + shape_A
B = fn.resize(images, size=crop_size, roi_start=start_B, roi_end=end_B)
A = fn.resize(images, size=crop_size, roi_start=start_A, roi_end=end_A)
Once it is merged you can access it via our nightly builds. I'll come back to you once this happens.
1.35 is available. Please check the newly introduced operator.