DALI icon indicating copy to clipboard operation
DALI copied to clipboard

[bug] DALI cannot handle the transfer of an empty tensor to the GPU

Open Freed-Wu opened this issue 2 years ago • 31 comments

I convert images (uin8) to features (int32, after quantization), then convert them to caffe lmdb. However, when I try to use DALI to read them, I met this error:

RuntimeError: Critical error in pipeline:
Error when executing Mixed operator decoders__ImageRandomCrop encountered:
[/opt/dali/dali/pipeline/data/buffer.h:142] Assert on "type_.id() == TypeTable::GetTypeId<T>()" failed: Calling type does not match buffer data type, requested type: uint8 current buffer type:
int32. To set type for the Buffer use 'set_type<T>()' or Resize(shape, type) first.

How can fn.readers.caffe support any data type which is not uint8? Thanks.

Freed-Wu avatar Sep 14 '22 08:09 Freed-Wu

Hello @Freed-Wu, thanks for reaching to us for help. It seems that you're passing something that's not an array of bytes to image decoder (fn.decoders.image_random_crop). If the image is already in a raw form in the LMDB file, you don't need an image decoder. That's how much I can guess from your question without looking at the pipeline code.

mzient avatar Sep 14 '22 09:09 mzient

without looking at the pipeline code.

Sorry for my forgetfulness :smile: The following is the code. You mean this is because the lmdb file contains a raw format data, so remove any code containing image decode can solve this problem? Thanks!

from nvidia.dali import pipeline_def, fn, types

@pipeline_def
def create_dali_pipeline(
    data_dir,
    crop,
    size,
    shard_id,
    num_shards,
    dali_cpu=False,
    is_training=True,
):
    images, labels = fn.readers.caffe(
        name="Reader", path=data_dir, shard_id=shard_id, num_shards=num_shards
    )
    labels = labels.gpu()
    dali_device = "cpu" if dali_cpu else "gpu"
    decoder_device = "cpu" if dali_cpu else "mixed"
    # ask nvJPEG to preallocate memory for the biggest sample in ImageNet for CPU and GPU to avoid reallocations in runtime
    device_memory_padding = 211025920 if decoder_device == "mixed" else 0
    host_memory_padding = 140544512 if decoder_device == "mixed" else 0
    # ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime
    preallocate_width_hint = 5980 if decoder_device == "mixed" else 0
    preallocate_height_hint = 6430 if decoder_device == "mixed" else 0
    if is_training:
        images = fn.decoders.image_random_crop(
            images,
            device=decoder_device,
            output_type=types.RGB,
            device_memory_padding=device_memory_padding,
            host_memory_padding=host_memory_padding,
            preallocate_width_hint=preallocate_width_hint,
            preallocate_height_hint=preallocate_height_hint,
            random_aspect_ratio=[0.8, 1.25],
            random_area=[0.1, 1.0],
            num_attempts=100,
        )
        images = fn.resize(
            images,
            device=dali_device,
            resize_x=crop,
            resize_y=crop,
            interp_type=types.INTERP_TRIANGULAR,
        )
        mirror = fn.random.coin_flip(probability=0.5)
    else:
        images = fn.decoders.image(
            images, device=decoder_device, output_type=types.RGB
        )
        images = fn.resize(
            images,
            device=dali_device,
            size=size,
            mode="not_smaller",
            interp_type=types.INTERP_TRIANGULAR,
        )
        mirror = False

    images = fn.crop_mirror_normalize(
        images.gpu(),
        dtype=types.FLOAT,
        output_layout="CHW",
        crop=(crop, crop),
        mirror=mirror,
    )
    return images, labels

Freed-Wu avatar Sep 14 '22 09:09 Freed-Wu

If I remove fn.decoders.image(), the code will throw error:

RuntimeError: Critical error in pipeline:
Error when executing Mixed operator MakeContiguous, instance name: "__MakeContiguous_Reader[1]", encountered:
[/opt/dali/dali/pipeline/data/tensor_list.h:191] Assert on "IsValidType(new_type)" failed: TensorList cannot be resized with invalid type. To zero out the TensorList Reset() can be used.

Freed-Wu avatar Sep 14 '22 09:09 Freed-Wu

@Freed-Wu This last error is quite unexpected - and quite possibly a bug. Which version of DALI are you using?

mzient avatar Sep 14 '22 09:09 mzient

❯ pip show nvidia-dali-cuda110
Name: nvidia-dali-cuda110
Version: 1.16.0
Summary: NVIDIA DALI  for CUDA 11.0. Git SHA: 83da7876a646b6c081df8bb816be97b08db54612
Home-page: https://github.com/NVIDIA/dali
Author: NVIDIA Corporation
Author-email:
License: Apache License 2.0
Location: /home/wzy/.local/lib/python3.10/site-packages
Requires:
Required-by:

Freed-Wu avatar Sep 14 '22 09:09 Freed-Wu

@Freed-Wu It would be very helpful if you could share some example LMDB file that causes these problems. It can be smaller or contain some made-up data. The fact that you cannot use it with image decoder is expected - what it needs is really a raw image file loaded to memory - with headers and everything - so it's pretty obvious that it requires a byte stream. However, the error without image decoder is what worries me.

mzient avatar Sep 14 '22 09:09 mzient

This file is test.py, has removed fn.decoders.image

"""https://github.com/NVIDIA/DALI/issues/4254"""
from nvidia.dali import pipeline_def, fn, types

@pipeline_def
def create_dali_pipeline(
    data_dir,
    crop,
    size,
    shard_id,
    num_shards,
    dali_cpu=False,
    is_training=True,
):
    images, labels = fn.readers.caffe(
        name="Reader", path=data_dir, shard_id=shard_id, num_shards=num_shards
    )
    labels = labels.gpu()
    dali_device = "cpu" if dali_cpu else "gpu"
    decoder_device = "cpu" if dali_cpu else "mixed"
    # ask nvJPEG to preallocate memory for the biggest sample in ImageNet for CPU and GPU to avoid reallocations in runtime
    device_memory_padding = 211025920 if decoder_device == "mixed" else 0
    host_memory_padding = 140544512 if decoder_device == "mixed" else 0
    # ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime
    preallocate_width_hint = 5980 if decoder_device == "mixed" else 0
    preallocate_height_hint = 6430 if decoder_device == "mixed" else 0
    if is_training:
        images = fn.decoders.image_random_crop(
            images,
            device=decoder_device,
            output_type=types.RGB,
            device_memory_padding=device_memory_padding,
            host_memory_padding=host_memory_padding,
            preallocate_width_hint=preallocate_width_hint,
            preallocate_height_hint=preallocate_height_hint,
            random_aspect_ratio=[0.8, 1.25],
            random_area=[0.1, 1.0],
            num_attempts=100,
        )
        images = fn.resize(
            images,
            device=dali_device,
            resize_x=crop,
            resize_y=crop,
            interp_type=types.INTERP_TRIANGULAR,
        )
        mirror = fn.random.coin_flip(probability=0.5)
    else:
        # remove fn.decoders.image
        images = fn.resize(
            images,
            device=dali_device,
            size=size,
            mode="not_smaller",
            interp_type=types.INTERP_TRIANGULAR,
        )
        mirror = False

    images = fn.crop_mirror_normalize(
        images.gpu(),
        dtype=types.FLOAT,
        output_layout="CHW",
        crop=(crop, crop),
        mirror=mirror,
    )
    return images, labels


def get_dataloader(
    args, data_dir, seed, is_training=True
):
    pipe = create_dali_pipeline(
        batch_size=args.batch_size,
        num_threads=args.workers,
        device_id=args.local_rank,
        seed=seed,
        data_dir=data_dir,
        crop=args.crop,
        size=args.size,
        dali_cpu=args.dali_cpu,
        shard_id=args.local_rank,
        num_shards=args.world_size,
        is_training=is_training,
    )
    pipe.build()
    return DALIClassificationIterator(
        pipe, reader_name="Reader", last_batch_policy=LastBatchPolicy.PARTIAL
    )


def _test(data_dir):
    from argparse import Namespace
    args = Namespace(
        batch_size=8,
        workers=1,
        local_rank=0,
        size=16,
        crop=14,
        dali_cpu=False,
        world_size=1,
    )
    train_dataloader = get_dataloader(
        args, data_dir, 0, False
    )
    out = train_dataloader.next()


if __name__ == '__main__':
    _test("/data/freedwu/imagenet4latent/train")

Use this following script to generate latent features of /data/freedwu/imagenet4latent from ImageNet-Caffe

"""https://github.com/BVLC/caffe/issues/1698#issuecomment-70211045"""
import os
from os.path import dirname as dirn
import sys
from argparse import Namespace

from compressai.zoo import image_models
from caffe.io import array_to_datum
import lmdb
from torch.autograd.grad_mode import no_grad

from datasets.dali.lmdb import get_dataloader  # noqa: E402


model = image_models["cheng2020-attn"](6, pretrained=True)
model = model.cuda()
model.eval()
batch_size = 4
args = Namespace(
    batch_size=batch_size,
    workers=16,
    local_rank=0,
    size=256,
    crop=256,  # not crop
    dali_cpu=False,
    world_size=1,
)
root_data_dir = "/data/bitahub/ILSVRC2012/ImageNet-Caffe"
data_dirs = [os.path.join(root_data_dir, file) for file in ["ilsvrc12_train_lmdb", "ilsvrc12_val_lmdb"]]
root_latent_dir = "/data/freedwu/imagenet4latent"
latent_dirs = [os.path.join(root_latent_dir, file) for file in ["train", "val"]]


def generate_lmdb(data_dir, latent_dir):
    os.makedirs(latent_dir, exist_ok=True)
    dataloader = get_dataloader(args, data_dir, 0, False)
    with lmdb.open(latent_dir, map_size=int(1e12)) as env, env.begin(write=True) as txn, no_grad():
        for i, data in enumerate(dataloader):
            samples, targets = data[0]["data"], data[0]["label"]
            latents = model.gaussian_conditional.quantize(
                model.g_a(samples), "symbols"
            ).cpu().numpy()
            for j, (latent, target) in enumerate(zip(latents, targets)):
                datum = array_to_datum(latent)
                datum.label = target.item()
                key = f"{i * batch_size + j:0>8d}"
                txn.put(key.encode(), datum.SerializeToString())


for data_dir, latent_dir in zip(data_dirs, latent_dirs):
    generate_lmdb(data_dir, latent_dir)

Freed-Wu avatar Sep 14 '22 10:09 Freed-Wu

So it is a bug of Dali, right?

Freed-Wu avatar Sep 14 '22 12:09 Freed-Wu

@Freed-Wu,

It looks like a DALI bug but I'm unable to create the test dataset the way you did it (the script you shared lacks get_dataloader definition). Can you create a toy data set and share it in this thread? It would make debugging much easier.

JanuszL avatar Sep 14 '22 12:09 JanuszL

@Freed-Wu Can you trim your dataset to a handful of samples (or maybe even one sample) so you could post it here?

mzient avatar Sep 14 '22 14:09 mzient

Sorry for late, this is test.py:

"""https://github.com/NVIDIA/DALI/issues/4254"""
from nvidia.dali import pipeline_def, fn, types
from nvidia.dali.plugin.pytorch import (
    DALIClassificationIterator,
    LastBatchPolicy,
)

@pipeline_def
def create_dali_pipeline(
    data_dir,
    crop,
    size,
    shard_id,
    num_shards,
    dali_cpu=False,
    is_training=True,
):
    images, labels = fn.readers.caffe(
        name="Reader", path=data_dir, shard_id=shard_id, num_shards=num_shards
    )
    labels = labels.gpu()
    dali_device = "cpu" if dali_cpu else "gpu"
    decoder_device = "cpu" if dali_cpu else "mixed"
    # ask nvJPEG to preallocate memory for the biggest sample in ImageNet for CPU and GPU to avoid reallocations in runtime
    device_memory_padding = 211025920 if decoder_device == "mixed" else 0
    host_memory_padding = 140544512 if decoder_device == "mixed" else 0
    # ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime
    preallocate_width_hint = 5980 if decoder_device == "mixed" else 0
    preallocate_height_hint = 6430 if decoder_device == "mixed" else 0
    if is_training:
        images = fn.decoders.image_random_crop(
            images,
            device=decoder_device,
            output_type=types.RGB,
            device_memory_padding=device_memory_padding,
            host_memory_padding=host_memory_padding,
            preallocate_width_hint=preallocate_width_hint,
            preallocate_height_hint=preallocate_height_hint,
            random_aspect_ratio=[0.8, 1.25],
            random_area=[0.1, 1.0],
            num_attempts=100,
        )
        images = fn.resize(
            images,
            device=dali_device,
            resize_x=crop,
            resize_y=crop,
            interp_type=types.INTERP_TRIANGULAR,
        )
        mirror = fn.random.coin_flip(probability=0.5)
    else:
        # remove fn.decoders.image
        images = fn.resize(
            images,
            device=dali_device,
            size=size,
            mode="not_smaller",
            interp_type=types.INTERP_TRIANGULAR,
        )
        mirror = False

    images = fn.crop_mirror_normalize(
        images.gpu(),
        dtype=types.FLOAT,
        output_layout="CHW",
        crop=(crop, crop),
        mirror=mirror,
    )
    return images, labels


def get_dataloader(
    args, data_dir, seed, is_training=True
):
    pipe = create_dali_pipeline(
        batch_size=args.batch_size,
        num_threads=args.workers,
        device_id=args.local_rank,
        seed=seed,
        data_dir=data_dir,
        crop=args.crop,
        size=args.size,
        dali_cpu=args.dali_cpu,
        shard_id=args.local_rank,
        num_shards=args.world_size,
        is_training=is_training,
    )
    pipe.build()
    return DALIClassificationIterator(
        pipe, reader_name="Reader", last_batch_policy=LastBatchPolicy.PARTIAL
    )


def _test(data_dir):
    from argparse import Namespace
    args = Namespace(
        batch_size=8,
        workers=1,
        local_rank=0,
        size=16,
        crop=14,
        dali_cpu=False,
        world_size=1,
    )
    train_dataloader = get_dataloader(
        args, data_dir, 0, False
    )
    out = train_dataloader.next()


if __name__ == '__main__':
    _test("/data/freedwu/imagenet4latent/train")

the train.zip is train.zip

Freed-Wu avatar Sep 14 '22 15:09 Freed-Wu

Hi @Freed-Wu,

I have managed to reproduce the problem. In your case the fn.readers.caffe returns empty image/s and DALI cannot handle the transfer of an empty tensor to the GPU. This is definitely an issue in DALI. As far as I know, the caffe sample entry schema is:

Datum {
  optional int32 channels = 1;
  optional int32 height = 2;
  optional int32 width = 3;
  // the actual image data, in bytes
  optional bytes data = 4;
  optional int32 label = 5;
  // Optionally, the datum could also hold float data.
  repeated float float_data = 6;
  // If true data contains an encoded image that need to be decoded
  optional bool encoded = 7 [default = false];
}

so to fix this problem on your side latent should be assigned to datum.data.

JanuszL avatar Sep 14 '22 17:09 JanuszL

However, only uint8 can be assigned to datum.data, int32 will be assigned to datum.float_data. So how can I handle int32 feature not uint8 image?

Freed-Wu avatar Sep 14 '22 17:09 Freed-Wu

You can cast it to int8 and the use DALI reinterpret operator to cast it back to int32.

JanuszL avatar Sep 14 '22 17:09 JanuszL

Cast directly int32 to uint8 will result in overflow/underflow, how to avoid it?

Freed-Wu avatar Sep 14 '22 17:09 Freed-Wu

By cast, I mean different interpretations of the underlying data, like numpy view.

JanuszL avatar Sep 14 '22 17:09 JanuszL

I see, it is just like

x = -1024 * np.ones([2, 2])
xv = x.view(dtype=np.uint8)

And how to reinterpret np.uint8 to np.int32? Just

  fn.reinterpret(images, dtype=np.int32)

, Right?

And the above is just a temporary scheme before the bug is fixed, right?

Thanks for your answer.

Freed-Wu avatar Sep 14 '22 17:09 Freed-Wu

I change the code to

        images = fn.reinterpret(images, dtype=DALIDataType.INT32)

Now I met

RuntimeError: Critical error in pipeline:
Error when executing GPU operator Resize encountered:
[/opt/dali/dali/pipeline/operator/op_schema.h:456] The number of dimensions 1 does not match any of the allowed layouts for input 0. Valid layouts are:
HWC
FHWC
CHW
FCHW
CFHW
DHWC
FDHWC
CDHW
FCDHW
CFDHW

What happened?

Freed-Wu avatar Sep 14 '22 18:09 Freed-Wu

This is the features' size:

In [1]: datum.height
Out[1]: 16

In [2]: datum.width
Out[2]: 64

In [3]: datum.channels
Out[3]: 192

Freed-Wu avatar Sep 14 '22 18:09 Freed-Wu

Can you share the updated data set?

JanuszL avatar Sep 14 '22 22:09 JanuszL

train.zip

Freed-Wu avatar Sep 15 '22 05:09 Freed-Wu

As I understand, the width and height provided in each entry are hints, unless you set encoded=False to indicate that raw images are stored, which doesn't require any decoding. Otherwise, DALI treats provided data as a 1D array that needs to be decoded.

JanuszL avatar Sep 15 '22 07:09 JanuszL

Where to set encoded=False? fn.resize(encoded=False) or fn.reinterpret(encoded=False)?

@pipeline_def
def create_dali_pipeline(
    data_dir,
    crop,
    size,
    shard_id,
    num_shards,
    dali_cpu=False,
    is_training=True,
):
    images, labels = fn.readers.caffe(
        name="Reader", path=data_dir, shard_id=shard_id, num_shards=num_shards
    )
    labels = labels.gpu()
    dali_device = "cpu" if dali_cpu else "gpu"
    decoder_device = "cpu" if dali_cpu else "mixed"
    # ask nvJPEG to preallocate memory for the biggest sample in ImageNet for CPU and GPU to avoid reallocations in runtime
    device_memory_padding = 211025920 if decoder_device == "mixed" else 0
    host_memory_padding = 140544512 if decoder_device == "mixed" else 0
    # ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime
    preallocate_width_hint = 5980 if decoder_device == "mixed" else 0
    preallocate_height_hint = 6430 if decoder_device == "mixed" else 0
    if is_training:
        images = fn.decoders.image_random_crop(
            images,
            device=decoder_device,
            output_type=types.RGB,
            device_memory_padding=device_memory_padding,
            host_memory_padding=host_memory_padding,
            preallocate_width_hint=preallocate_width_hint,
            preallocate_height_hint=preallocate_height_hint,
            random_aspect_ratio=[0.8, 1.25],
            random_area=[0.1, 1.0],
            num_attempts=100,
        )
        images = fn.resize(
            images,
            device=dali_device,
            resize_x=crop,
            resize_y=crop,
            interp_type=types.INTERP_TRIANGULAR,
        )
        mirror = fn.random.coin_flip(probability=0.5)
    else:
        images = fn.reinterpret(images, dtype=DALIDataType.INT32)
        images = fn.resize(
            images,
            device=dali_device,
            size=size,
            mode="not_smaller",
            interp_type=types.INTERP_TRIANGULAR,
        )
        mirror = False

    images = fn.crop_mirror_normalize(
        images.gpu(),
        dtype=types.FLOAT,
        output_layout="CHW",
        crop=(crop, crop),
        mirror=mirror,
    )
    return images, labels

Freed-Wu avatar Sep 15 '22 08:09 Freed-Wu

fn.resize(encoded=False) or fn.reinterpret(encoded=False)?

It seems fn.resize and fn.reinterpret don't have encoded ...

Freed-Wu avatar Sep 15 '22 14:09 Freed-Wu

I see, fn.reinterpret will convert the data CxHxW to 1xCHW, which cannot be recognized as CxHxW.

Freed-Wu avatar Sep 15 '22 14:09 Freed-Wu

I was talking about LMDB schema mentioned in https://github.com/NVIDIA/DALI/issues/4254#issuecomment-1247064906. Regarding the fn.reinterpret operator you can set the layout ("CHW" or "HWC") and shape, in this case, I would use rel;_shape=[1, 1, -1].

JanuszL avatar Sep 15 '22 14:09 JanuszL

Now I met

RuntimeError: Critical error in pipeline:
Error when executing GPU operator Resize encountered:
[/opt/dali/dali/operators/image/resize/resize_base.cc:45] Unsupported type: int32. Supported types are: uint8, int16, uint16 and float
@pipeline_def
def create_dali_pipeline(
    data_dir,
    crop,
    size,
    shard_id,
    num_shards,
    dali_cpu=False,
    is_training=True,
):
    images, labels = fn.readers.caffe(
        name="Reader", path=data_dir, shard_id=shard_id, num_shards=num_shards
    )
    images = fn.reinterpret(
        images,
        dtype=DALIDataType.INT32,
        shape=[192, 16, 16],
    )
    images = images.gpu()
    labels = labels.gpu()
    dali_device = "cpu" if dali_cpu else "gpu"
    decoder_device = "cpu" if dali_cpu else "mixed"
    # ask nvJPEG to preallocate memory for the biggest sample in ImageNet for CPU and GPU to avoid reallocations in runtime
    device_memory_padding = 211025920 if decoder_device == "mixed" else 0
    host_memory_padding = 140544512 if decoder_device == "mixed" else 0
    # ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime
    preallocate_width_hint = 5980 if decoder_device == "mixed" else 0
    preallocate_height_hint = 6430 if decoder_device == "mixed" else 0
    images = fn.resize(
        images,
        device=dali_device,
        size=size,
        mode="not_smaller",
        interp_type=types.INTERP_TRIANGULAR,
    )
    mirror = False

    images = fn.crop_mirror_normalize(
        images.gpu(),
        dtype=types.FLOAT,
        output_layout="CHW",
        crop=(crop, crop),
        mirror=mirror,
    )
    return images, labels

Freed-Wu avatar Sep 15 '22 15:09 Freed-Wu

If I remove resize, just crop, the code can pass, however, the channel is incorrect,

Before crop

@pipeline_def
def create_dali_pipeline(
    data_dir,
    crop,
    size,
    shard_id,
    num_shards,
    dali_cpu=False,
    is_training=True,
):
    images, labels = fn.readers.caffe(
        name="Reader", path=data_dir, shard_id=shard_id, num_shards=num_shards
    )
    images = fn.reinterpret(
        images,
        dtype=DALIDataType.INT32,
        shape=[192, 16, 16],
    )
    images = images.gpu()
    dali_device = "cpu" if dali_cpu else "gpu"
    decoder_device = "cpu" if dali_cpu else "mixed"
    # ask nvJPEG to preallocate memory for the biggest sample in ImageNet for CPU and GPU to avoid reallocations in runtime
    device_memory_padding = 211025920 if decoder_device == "mixed" else 0
    host_memory_padding = 140544512 if decoder_device == "mixed" else 0
    # ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime
    preallocate_width_hint = 5980 if decoder_device == "mixed" else 0
    preallocate_height_hint = 6430 if decoder_device == "mixed" else 0
    if is_training:
        images = fn.decoders.image_random_crop(
            images,
            device=decoder_device,
            output_type=types.RGB,
            device_memory_padding=device_memory_padding,
            host_memory_padding=host_memory_padding,
            preallocate_width_hint=preallocate_width_hint,
            preallocate_height_hint=preallocate_height_hint,
            random_aspect_ratio=[0.8, 1.25],
            random_area=[0.1, 1.0],
            num_attempts=100,
        )
        images = fn.resize(
            images,
            device=dali_device,
            resize_x=crop,
            resize_y=crop,
            interp_type=types.INTERP_TRIANGULAR,
        )
        mirror = fn.random.coin_flip(probability=0.5)
    else:
        # images = fn.resize(
        #     images,
        #     device=dali_device,
        #     size=size,
        #     mode="not_smaller",
        #     interp_type=types.INTERP_TRIANGULAR,
        # )
        mirror = False

    # images = fn.crop_mirror_normalize(
    #     images,
    #     dtype=types.FLOAT,
    #     output_layout="CHW",
    #     crop=(crop, crop),
    #     mirror=mirror,
    # )
    return images, labels

will return feature whose shape is 1x192x16x16 and dtype is int32. I hope to crop it to 1x192x14x14.

After crop

@pipeline_def
def create_dali_pipeline(
    data_dir,
    crop,
    size,
    shard_id,
    num_shards,
    dali_cpu=False,
    is_training=True,
):
    images, labels = fn.readers.caffe(
        name="Reader", path=data_dir, shard_id=shard_id, num_shards=num_shards
    )
    images = fn.reinterpret(
        images,
        dtype=DALIDataType.INT32,
        shape=[192, 16, 16],
    )
    images = images.gpu()
    dali_device = "cpu" if dali_cpu else "gpu"
    decoder_device = "cpu" if dali_cpu else "mixed"
    # ask nvJPEG to preallocate memory for the biggest sample in ImageNet for CPU and GPU to avoid reallocations in runtime
    device_memory_padding = 211025920 if decoder_device == "mixed" else 0
    host_memory_padding = 140544512 if decoder_device == "mixed" else 0
    # ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime
    preallocate_width_hint = 5980 if decoder_device == "mixed" else 0
    preallocate_height_hint = 6430 if decoder_device == "mixed" else 0
    if is_training:
        images = fn.decoders.image_random_crop(
            images,
            device=decoder_device,
            output_type=types.RGB,
            device_memory_padding=device_memory_padding,
            host_memory_padding=host_memory_padding,
            preallocate_width_hint=preallocate_width_hint,
            preallocate_height_hint=preallocate_height_hint,
            random_aspect_ratio=[0.8, 1.25],
            random_area=[0.1, 1.0],
            num_attempts=100,
        )
        images = fn.resize(
            images,
            device=dali_device,
            resize_x=crop,
            resize_y=crop,
            interp_type=types.INTERP_TRIANGULAR,
        )
        mirror = fn.random.coin_flip(probability=0.5)
    else:
        # images = fn.resize(
        #     images,
        #     device=dali_device,
        #     size=size,
        #     mode="not_smaller",
        #     interp_type=types.INTERP_TRIANGULAR,
        # )
        mirror = False

    images = fn.crop_mirror_normalize(
        images,
        dtype=types.FLOAT,
        output_layout="CHW",
        crop=(crop, crop),
        mirror=mirror,
    )
    return images, labels

will return feature whose shape is 1x16x14x14 not 1x192x14x14. What something wrong I did?

Freed-Wu avatar Sep 15 '22 15:09 Freed-Wu

And how to random crop without decode (fn.decoders.image_random_crop())? I hope I can random_crop 192x16x16 to 192x14x14 ...

Freed-Wu avatar Sep 15 '22 15:09 Freed-Wu

Hi @Freed-Wu, Regarding resize, int32 is not supported, you can convert it to float type first and then resize. Regarding random cropping, you can use random operators as a source of anchors for slice/crop operator. Also, crop converts the layout from HWC to CHW that is why you have your dimensions swapped. However crop doesn't support trimming channels, You can either use the slice operator or tensor indexing.

JanuszL avatar Sep 16 '22 06:09 JanuszL