DALI icon indicating copy to clipboard operation
DALI copied to clipboard

Extracting Randomly Generated Parameters from "fn.decoders.image_random_crop"?

Open AhmedHussKhalifa opened this issue 2 years ago • 1 comments

Describe the question.

Hello,

I am currently working on training on ImageNet within the KD framework, using the following "create_dali_pipeline" function.

My objective is to extract the augmentation parameters that are applied to the original image during training, so that I can store and utilize them for offline training purposes, specifically the coordinates generated by "fn.decoders.image_random_crop".

While fixing the seed can generate the same augmentation parameters per epoch, it doesn't solve other potential issues that I have in my framework. Therefore, I am looking for alternative solutions to overcome these challenges.

If you have any insights or suggestions on how I can accomplish this, I would greatly appreciate your assistance. Thank you.

def create_dali_pipeline(data_dir, crop, size, shard_id, num_shards, NUM_QFs=0, dali_cpu=False, is_training=True):
    images, labels = fn.readers.file(file_root=data_dir,
                                     shard_id=shard_id,
                                     num_shards=num_shards,
                                     random_shuffle=is_training,
                                     pad_last_batch=True,
                                     name="Reader")

    dali_device = 'cpu' if dali_cpu else 'gpu'
    decoder_device = 'cpu' if dali_cpu else 'mixed'
    # ask nvJPEG to preallocate memory for the biggest sample in ImageNet for CPU and GPU to avoid reallocations in runtime
    
    device_memory_padding = 211025920 if decoder_device == 'mixed' else 0
    host_memory_padding = 140544512 if decoder_device == 'mixed' else 0
    # ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime
    

    preallocate_width_hint = 5980 if decoder_device == 'mixed' else 0
    preallocate_height_hint = 6430 if decoder_device == 'mixed' else 0
    images_org = fn.decoders.image_random_crop(images,
                                            device=decoder_device, output_type=types.RGB,
                                            device_memory_padding=device_memory_padding,
                                            host_memory_padding=host_memory_padding,
                                            preallocate_width_hint=preallocate_width_hint,
                                            preallocate_height_hint=preallocate_height_hint,
                                            random_aspect_ratio=[0.75, 4.0 / 3.0],
                                            random_area=[0.08, 1.0],
                                            num_attempts=100)
    images = fn.resize(images_org,
                        device=dali_device,
                        resize_x=crop,
                        resize_y=crop,
                        interp_type=types.INTERP_TRIANGULAR)
    mirror = fn.random.coin_flip(probability=0.5)
    original = fn.crop_mirror_normalize(images.gpu(),
                                    dtype=types.FLOAT,
                                    output_layout="CHW",
                                    crop=(crop, crop),
                                    mean=[0.485 * 255,0.456 * 255,0.406 * 255],
                                    std=[0.229 * 255,0.224 * 255,0.225 * 255],
                                    mirror=mirror)

Check for duplicates

  • [X] I have searched the open bugs/issues and have found no duplicates for this bug report

AhmedHussKhalifa avatar Jun 20 '23 14:06 AhmedHussKhalifa

Hi @AhmedHussKhalifa,

Thank you for reaching out. I think you can use the peek image shape operator to learn the shape of the input image, math operations to compute the crop parameters range and then use the random generator to generate ones. Obtained values could be provided to the slice operator. Of course it won't provide the exact behavior of the random_crop as it is an iterative process. Another approach is to use the external source operator to load images, peak their shapes on the CPU and compute random_crop parameters using python. The operators should not be compute intense and the actual image decoding would still happen on the GPU.

JanuszL avatar Jun 20 '23 15:06 JanuszL