DALI
DALI copied to clipboard
[bug] DALI cannot handle the transfer of an empty tensor to the GPU
I convert images (uin8) to features (int32, after quantization), then convert them to caffe lmdb. However, when I try to use DALI to read them, I met this error:
RuntimeError: Critical error in pipeline:
Error when executing Mixed operator decoders__ImageRandomCrop encountered:
[/opt/dali/dali/pipeline/data/buffer.h:142] Assert on "type_.id() == TypeTable::GetTypeId<T>()" failed: Calling type does not match buffer data type, requested type: uint8 current buffer type:
int32. To set type for the Buffer use 'set_type<T>()' or Resize(shape, type) first.
How can fn.readers.caffe
support any data type which is not uint8? Thanks.
Hello @Freed-Wu, thanks for reaching to us for help.
It seems that you're passing something that's not an array of bytes to image decoder (fn.decoders.image_random_crop
). If the image is already in a raw form in the LMDB file, you don't need an image decoder.
That's how much I can guess from your question without looking at the pipeline code.
without looking at the pipeline code.
Sorry for my forgetfulness :smile: The following is the code. You mean this is because the lmdb file contains a raw format data, so remove any code containing image decode can solve this problem? Thanks!
from nvidia.dali import pipeline_def, fn, types
@pipeline_def
def create_dali_pipeline(
data_dir,
crop,
size,
shard_id,
num_shards,
dali_cpu=False,
is_training=True,
):
images, labels = fn.readers.caffe(
name="Reader", path=data_dir, shard_id=shard_id, num_shards=num_shards
)
labels = labels.gpu()
dali_device = "cpu" if dali_cpu else "gpu"
decoder_device = "cpu" if dali_cpu else "mixed"
# ask nvJPEG to preallocate memory for the biggest sample in ImageNet for CPU and GPU to avoid reallocations in runtime
device_memory_padding = 211025920 if decoder_device == "mixed" else 0
host_memory_padding = 140544512 if decoder_device == "mixed" else 0
# ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime
preallocate_width_hint = 5980 if decoder_device == "mixed" else 0
preallocate_height_hint = 6430 if decoder_device == "mixed" else 0
if is_training:
images = fn.decoders.image_random_crop(
images,
device=decoder_device,
output_type=types.RGB,
device_memory_padding=device_memory_padding,
host_memory_padding=host_memory_padding,
preallocate_width_hint=preallocate_width_hint,
preallocate_height_hint=preallocate_height_hint,
random_aspect_ratio=[0.8, 1.25],
random_area=[0.1, 1.0],
num_attempts=100,
)
images = fn.resize(
images,
device=dali_device,
resize_x=crop,
resize_y=crop,
interp_type=types.INTERP_TRIANGULAR,
)
mirror = fn.random.coin_flip(probability=0.5)
else:
images = fn.decoders.image(
images, device=decoder_device, output_type=types.RGB
)
images = fn.resize(
images,
device=dali_device,
size=size,
mode="not_smaller",
interp_type=types.INTERP_TRIANGULAR,
)
mirror = False
images = fn.crop_mirror_normalize(
images.gpu(),
dtype=types.FLOAT,
output_layout="CHW",
crop=(crop, crop),
mirror=mirror,
)
return images, labels
If I remove fn.decoders.image()
, the code will throw error:
RuntimeError: Critical error in pipeline:
Error when executing Mixed operator MakeContiguous, instance name: "__MakeContiguous_Reader[1]", encountered:
[/opt/dali/dali/pipeline/data/tensor_list.h:191] Assert on "IsValidType(new_type)" failed: TensorList cannot be resized with invalid type. To zero out the TensorList Reset() can be used.
@Freed-Wu This last error is quite unexpected - and quite possibly a bug. Which version of DALI are you using?
❯ pip show nvidia-dali-cuda110
Name: nvidia-dali-cuda110
Version: 1.16.0
Summary: NVIDIA DALI for CUDA 11.0. Git SHA: 83da7876a646b6c081df8bb816be97b08db54612
Home-page: https://github.com/NVIDIA/dali
Author: NVIDIA Corporation
Author-email:
License: Apache License 2.0
Location: /home/wzy/.local/lib/python3.10/site-packages
Requires:
Required-by:
@Freed-Wu It would be very helpful if you could share some example LMDB file that causes these problems. It can be smaller or contain some made-up data. The fact that you cannot use it with image decoder is expected - what it needs is really a raw image file loaded to memory - with headers and everything - so it's pretty obvious that it requires a byte stream. However, the error without image decoder is what worries me.
This file is test.py
, has removed fn.decoders.image
"""https://github.com/NVIDIA/DALI/issues/4254"""
from nvidia.dali import pipeline_def, fn, types
@pipeline_def
def create_dali_pipeline(
data_dir,
crop,
size,
shard_id,
num_shards,
dali_cpu=False,
is_training=True,
):
images, labels = fn.readers.caffe(
name="Reader", path=data_dir, shard_id=shard_id, num_shards=num_shards
)
labels = labels.gpu()
dali_device = "cpu" if dali_cpu else "gpu"
decoder_device = "cpu" if dali_cpu else "mixed"
# ask nvJPEG to preallocate memory for the biggest sample in ImageNet for CPU and GPU to avoid reallocations in runtime
device_memory_padding = 211025920 if decoder_device == "mixed" else 0
host_memory_padding = 140544512 if decoder_device == "mixed" else 0
# ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime
preallocate_width_hint = 5980 if decoder_device == "mixed" else 0
preallocate_height_hint = 6430 if decoder_device == "mixed" else 0
if is_training:
images = fn.decoders.image_random_crop(
images,
device=decoder_device,
output_type=types.RGB,
device_memory_padding=device_memory_padding,
host_memory_padding=host_memory_padding,
preallocate_width_hint=preallocate_width_hint,
preallocate_height_hint=preallocate_height_hint,
random_aspect_ratio=[0.8, 1.25],
random_area=[0.1, 1.0],
num_attempts=100,
)
images = fn.resize(
images,
device=dali_device,
resize_x=crop,
resize_y=crop,
interp_type=types.INTERP_TRIANGULAR,
)
mirror = fn.random.coin_flip(probability=0.5)
else:
# remove fn.decoders.image
images = fn.resize(
images,
device=dali_device,
size=size,
mode="not_smaller",
interp_type=types.INTERP_TRIANGULAR,
)
mirror = False
images = fn.crop_mirror_normalize(
images.gpu(),
dtype=types.FLOAT,
output_layout="CHW",
crop=(crop, crop),
mirror=mirror,
)
return images, labels
def get_dataloader(
args, data_dir, seed, is_training=True
):
pipe = create_dali_pipeline(
batch_size=args.batch_size,
num_threads=args.workers,
device_id=args.local_rank,
seed=seed,
data_dir=data_dir,
crop=args.crop,
size=args.size,
dali_cpu=args.dali_cpu,
shard_id=args.local_rank,
num_shards=args.world_size,
is_training=is_training,
)
pipe.build()
return DALIClassificationIterator(
pipe, reader_name="Reader", last_batch_policy=LastBatchPolicy.PARTIAL
)
def _test(data_dir):
from argparse import Namespace
args = Namespace(
batch_size=8,
workers=1,
local_rank=0,
size=16,
crop=14,
dali_cpu=False,
world_size=1,
)
train_dataloader = get_dataloader(
args, data_dir, 0, False
)
out = train_dataloader.next()
if __name__ == '__main__':
_test("/data/freedwu/imagenet4latent/train")
Use this following script to generate latent features of
/data/freedwu/imagenet4latent
from ImageNet-Caffe
"""https://github.com/BVLC/caffe/issues/1698#issuecomment-70211045"""
import os
from os.path import dirname as dirn
import sys
from argparse import Namespace
from compressai.zoo import image_models
from caffe.io import array_to_datum
import lmdb
from torch.autograd.grad_mode import no_grad
from datasets.dali.lmdb import get_dataloader # noqa: E402
model = image_models["cheng2020-attn"](6, pretrained=True)
model = model.cuda()
model.eval()
batch_size = 4
args = Namespace(
batch_size=batch_size,
workers=16,
local_rank=0,
size=256,
crop=256, # not crop
dali_cpu=False,
world_size=1,
)
root_data_dir = "/data/bitahub/ILSVRC2012/ImageNet-Caffe"
data_dirs = [os.path.join(root_data_dir, file) for file in ["ilsvrc12_train_lmdb", "ilsvrc12_val_lmdb"]]
root_latent_dir = "/data/freedwu/imagenet4latent"
latent_dirs = [os.path.join(root_latent_dir, file) for file in ["train", "val"]]
def generate_lmdb(data_dir, latent_dir):
os.makedirs(latent_dir, exist_ok=True)
dataloader = get_dataloader(args, data_dir, 0, False)
with lmdb.open(latent_dir, map_size=int(1e12)) as env, env.begin(write=True) as txn, no_grad():
for i, data in enumerate(dataloader):
samples, targets = data[0]["data"], data[0]["label"]
latents = model.gaussian_conditional.quantize(
model.g_a(samples), "symbols"
).cpu().numpy()
for j, (latent, target) in enumerate(zip(latents, targets)):
datum = array_to_datum(latent)
datum.label = target.item()
key = f"{i * batch_size + j:0>8d}"
txn.put(key.encode(), datum.SerializeToString())
for data_dir, latent_dir in zip(data_dirs, latent_dirs):
generate_lmdb(data_dir, latent_dir)
So it is a bug of Dali, right?
@Freed-Wu,
It looks like a DALI bug but I'm unable to create the test dataset the way you did it (the script you shared lacks get_dataloader
definition). Can you create a toy data set and share it in this thread? It would make debugging much easier.
@Freed-Wu Can you trim your dataset to a handful of samples (or maybe even one sample) so you could post it here?
Sorry for late, this is test.py:
"""https://github.com/NVIDIA/DALI/issues/4254"""
from nvidia.dali import pipeline_def, fn, types
from nvidia.dali.plugin.pytorch import (
DALIClassificationIterator,
LastBatchPolicy,
)
@pipeline_def
def create_dali_pipeline(
data_dir,
crop,
size,
shard_id,
num_shards,
dali_cpu=False,
is_training=True,
):
images, labels = fn.readers.caffe(
name="Reader", path=data_dir, shard_id=shard_id, num_shards=num_shards
)
labels = labels.gpu()
dali_device = "cpu" if dali_cpu else "gpu"
decoder_device = "cpu" if dali_cpu else "mixed"
# ask nvJPEG to preallocate memory for the biggest sample in ImageNet for CPU and GPU to avoid reallocations in runtime
device_memory_padding = 211025920 if decoder_device == "mixed" else 0
host_memory_padding = 140544512 if decoder_device == "mixed" else 0
# ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime
preallocate_width_hint = 5980 if decoder_device == "mixed" else 0
preallocate_height_hint = 6430 if decoder_device == "mixed" else 0
if is_training:
images = fn.decoders.image_random_crop(
images,
device=decoder_device,
output_type=types.RGB,
device_memory_padding=device_memory_padding,
host_memory_padding=host_memory_padding,
preallocate_width_hint=preallocate_width_hint,
preallocate_height_hint=preallocate_height_hint,
random_aspect_ratio=[0.8, 1.25],
random_area=[0.1, 1.0],
num_attempts=100,
)
images = fn.resize(
images,
device=dali_device,
resize_x=crop,
resize_y=crop,
interp_type=types.INTERP_TRIANGULAR,
)
mirror = fn.random.coin_flip(probability=0.5)
else:
# remove fn.decoders.image
images = fn.resize(
images,
device=dali_device,
size=size,
mode="not_smaller",
interp_type=types.INTERP_TRIANGULAR,
)
mirror = False
images = fn.crop_mirror_normalize(
images.gpu(),
dtype=types.FLOAT,
output_layout="CHW",
crop=(crop, crop),
mirror=mirror,
)
return images, labels
def get_dataloader(
args, data_dir, seed, is_training=True
):
pipe = create_dali_pipeline(
batch_size=args.batch_size,
num_threads=args.workers,
device_id=args.local_rank,
seed=seed,
data_dir=data_dir,
crop=args.crop,
size=args.size,
dali_cpu=args.dali_cpu,
shard_id=args.local_rank,
num_shards=args.world_size,
is_training=is_training,
)
pipe.build()
return DALIClassificationIterator(
pipe, reader_name="Reader", last_batch_policy=LastBatchPolicy.PARTIAL
)
def _test(data_dir):
from argparse import Namespace
args = Namespace(
batch_size=8,
workers=1,
local_rank=0,
size=16,
crop=14,
dali_cpu=False,
world_size=1,
)
train_dataloader = get_dataloader(
args, data_dir, 0, False
)
out = train_dataloader.next()
if __name__ == '__main__':
_test("/data/freedwu/imagenet4latent/train")
the train.zip is train.zip
Hi @Freed-Wu,
I have managed to reproduce the problem. In your case the fn.readers.caffe
returns empty image/s and DALI cannot handle the transfer of an empty tensor to the GPU. This is definitely an issue in DALI.
As far as I know, the caffe sample entry schema is:
Datum {
optional int32 channels = 1;
optional int32 height = 2;
optional int32 width = 3;
// the actual image data, in bytes
optional bytes data = 4;
optional int32 label = 5;
// Optionally, the datum could also hold float data.
repeated float float_data = 6;
// If true data contains an encoded image that need to be decoded
optional bool encoded = 7 [default = false];
}
so to fix this problem on your side latent
should be assigned to datum.data
.
However, only uint8
can be assigned to datum.data
, int32
will be assigned to datum.float_data
. So how can I handle int32
feature not uint8
image?
You can cast it to int8 and the use DALI reinterpret
operator to cast it back to int32.
Cast directly int32
to uint8
will result in overflow/underflow, how to avoid it?
By cast, I mean different interpretations of the underlying data, like numpy view.
I see, it is just like
x = -1024 * np.ones([2, 2])
xv = x.view(dtype=np.uint8)
And how to reinterpret np.uint8
to np.int32
? Just
fn.reinterpret(images, dtype=np.int32)
, Right?
And the above is just a temporary scheme before the bug is fixed, right?
Thanks for your answer.
I change the code to
images = fn.reinterpret(images, dtype=DALIDataType.INT32)
Now I met
RuntimeError: Critical error in pipeline:
Error when executing GPU operator Resize encountered:
[/opt/dali/dali/pipeline/operator/op_schema.h:456] The number of dimensions 1 does not match any of the allowed layouts for input 0. Valid layouts are:
HWC
FHWC
CHW
FCHW
CFHW
DHWC
FDHWC
CDHW
FCDHW
CFDHW
What happened?
This is the features' size:
In [1]: datum.height
Out[1]: 16
In [2]: datum.width
Out[2]: 64
In [3]: datum.channels
Out[3]: 192
Can you share the updated data set?
As I understand, the width and height provided in each entry are hints, unless you set encoded=False
to indicate that raw images are stored, which doesn't require any decoding. Otherwise, DALI treats provided data as a 1D array that needs to be decoded.
Where to set encoded=False
? fn.resize(encoded=False)
or fn.reinterpret(encoded=False)
?
@pipeline_def
def create_dali_pipeline(
data_dir,
crop,
size,
shard_id,
num_shards,
dali_cpu=False,
is_training=True,
):
images, labels = fn.readers.caffe(
name="Reader", path=data_dir, shard_id=shard_id, num_shards=num_shards
)
labels = labels.gpu()
dali_device = "cpu" if dali_cpu else "gpu"
decoder_device = "cpu" if dali_cpu else "mixed"
# ask nvJPEG to preallocate memory for the biggest sample in ImageNet for CPU and GPU to avoid reallocations in runtime
device_memory_padding = 211025920 if decoder_device == "mixed" else 0
host_memory_padding = 140544512 if decoder_device == "mixed" else 0
# ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime
preallocate_width_hint = 5980 if decoder_device == "mixed" else 0
preallocate_height_hint = 6430 if decoder_device == "mixed" else 0
if is_training:
images = fn.decoders.image_random_crop(
images,
device=decoder_device,
output_type=types.RGB,
device_memory_padding=device_memory_padding,
host_memory_padding=host_memory_padding,
preallocate_width_hint=preallocate_width_hint,
preallocate_height_hint=preallocate_height_hint,
random_aspect_ratio=[0.8, 1.25],
random_area=[0.1, 1.0],
num_attempts=100,
)
images = fn.resize(
images,
device=dali_device,
resize_x=crop,
resize_y=crop,
interp_type=types.INTERP_TRIANGULAR,
)
mirror = fn.random.coin_flip(probability=0.5)
else:
images = fn.reinterpret(images, dtype=DALIDataType.INT32)
images = fn.resize(
images,
device=dali_device,
size=size,
mode="not_smaller",
interp_type=types.INTERP_TRIANGULAR,
)
mirror = False
images = fn.crop_mirror_normalize(
images.gpu(),
dtype=types.FLOAT,
output_layout="CHW",
crop=(crop, crop),
mirror=mirror,
)
return images, labels
fn.resize(encoded=False) or fn.reinterpret(encoded=False)?
It seems fn.resize
and fn.reinterpret
don't have encoded
...
I see, fn.reinterpret
will convert the data CxHxW
to 1xCHW
, which cannot be recognized as CxHxW
.
I was talking about LMDB schema mentioned in https://github.com/NVIDIA/DALI/issues/4254#issuecomment-1247064906.
Regarding the fn.reinterpret
operator you can set the layout ("CHW" or "HWC") and shape, in this case, I would use rel;_shape=[1, 1, -1]
.
Now I met
RuntimeError: Critical error in pipeline:
Error when executing GPU operator Resize encountered:
[/opt/dali/dali/operators/image/resize/resize_base.cc:45] Unsupported type: int32. Supported types are: uint8, int16, uint16 and float
@pipeline_def
def create_dali_pipeline(
data_dir,
crop,
size,
shard_id,
num_shards,
dali_cpu=False,
is_training=True,
):
images, labels = fn.readers.caffe(
name="Reader", path=data_dir, shard_id=shard_id, num_shards=num_shards
)
images = fn.reinterpret(
images,
dtype=DALIDataType.INT32,
shape=[192, 16, 16],
)
images = images.gpu()
labels = labels.gpu()
dali_device = "cpu" if dali_cpu else "gpu"
decoder_device = "cpu" if dali_cpu else "mixed"
# ask nvJPEG to preallocate memory for the biggest sample in ImageNet for CPU and GPU to avoid reallocations in runtime
device_memory_padding = 211025920 if decoder_device == "mixed" else 0
host_memory_padding = 140544512 if decoder_device == "mixed" else 0
# ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime
preallocate_width_hint = 5980 if decoder_device == "mixed" else 0
preallocate_height_hint = 6430 if decoder_device == "mixed" else 0
images = fn.resize(
images,
device=dali_device,
size=size,
mode="not_smaller",
interp_type=types.INTERP_TRIANGULAR,
)
mirror = False
images = fn.crop_mirror_normalize(
images.gpu(),
dtype=types.FLOAT,
output_layout="CHW",
crop=(crop, crop),
mirror=mirror,
)
return images, labels
If I remove resize
, just crop
, the code can pass, however, the channel is
incorrect,
Before crop
@pipeline_def
def create_dali_pipeline(
data_dir,
crop,
size,
shard_id,
num_shards,
dali_cpu=False,
is_training=True,
):
images, labels = fn.readers.caffe(
name="Reader", path=data_dir, shard_id=shard_id, num_shards=num_shards
)
images = fn.reinterpret(
images,
dtype=DALIDataType.INT32,
shape=[192, 16, 16],
)
images = images.gpu()
dali_device = "cpu" if dali_cpu else "gpu"
decoder_device = "cpu" if dali_cpu else "mixed"
# ask nvJPEG to preallocate memory for the biggest sample in ImageNet for CPU and GPU to avoid reallocations in runtime
device_memory_padding = 211025920 if decoder_device == "mixed" else 0
host_memory_padding = 140544512 if decoder_device == "mixed" else 0
# ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime
preallocate_width_hint = 5980 if decoder_device == "mixed" else 0
preallocate_height_hint = 6430 if decoder_device == "mixed" else 0
if is_training:
images = fn.decoders.image_random_crop(
images,
device=decoder_device,
output_type=types.RGB,
device_memory_padding=device_memory_padding,
host_memory_padding=host_memory_padding,
preallocate_width_hint=preallocate_width_hint,
preallocate_height_hint=preallocate_height_hint,
random_aspect_ratio=[0.8, 1.25],
random_area=[0.1, 1.0],
num_attempts=100,
)
images = fn.resize(
images,
device=dali_device,
resize_x=crop,
resize_y=crop,
interp_type=types.INTERP_TRIANGULAR,
)
mirror = fn.random.coin_flip(probability=0.5)
else:
# images = fn.resize(
# images,
# device=dali_device,
# size=size,
# mode="not_smaller",
# interp_type=types.INTERP_TRIANGULAR,
# )
mirror = False
# images = fn.crop_mirror_normalize(
# images,
# dtype=types.FLOAT,
# output_layout="CHW",
# crop=(crop, crop),
# mirror=mirror,
# )
return images, labels
will return feature whose shape is 1x192x16x16
and dtype is int32
. I hope to
crop it to 1x192x14x14
.
After crop
@pipeline_def
def create_dali_pipeline(
data_dir,
crop,
size,
shard_id,
num_shards,
dali_cpu=False,
is_training=True,
):
images, labels = fn.readers.caffe(
name="Reader", path=data_dir, shard_id=shard_id, num_shards=num_shards
)
images = fn.reinterpret(
images,
dtype=DALIDataType.INT32,
shape=[192, 16, 16],
)
images = images.gpu()
dali_device = "cpu" if dali_cpu else "gpu"
decoder_device = "cpu" if dali_cpu else "mixed"
# ask nvJPEG to preallocate memory for the biggest sample in ImageNet for CPU and GPU to avoid reallocations in runtime
device_memory_padding = 211025920 if decoder_device == "mixed" else 0
host_memory_padding = 140544512 if decoder_device == "mixed" else 0
# ask HW NVJPEG to allocate memory ahead for the biggest image in the data set to avoid reallocations in runtime
preallocate_width_hint = 5980 if decoder_device == "mixed" else 0
preallocate_height_hint = 6430 if decoder_device == "mixed" else 0
if is_training:
images = fn.decoders.image_random_crop(
images,
device=decoder_device,
output_type=types.RGB,
device_memory_padding=device_memory_padding,
host_memory_padding=host_memory_padding,
preallocate_width_hint=preallocate_width_hint,
preallocate_height_hint=preallocate_height_hint,
random_aspect_ratio=[0.8, 1.25],
random_area=[0.1, 1.0],
num_attempts=100,
)
images = fn.resize(
images,
device=dali_device,
resize_x=crop,
resize_y=crop,
interp_type=types.INTERP_TRIANGULAR,
)
mirror = fn.random.coin_flip(probability=0.5)
else:
# images = fn.resize(
# images,
# device=dali_device,
# size=size,
# mode="not_smaller",
# interp_type=types.INTERP_TRIANGULAR,
# )
mirror = False
images = fn.crop_mirror_normalize(
images,
dtype=types.FLOAT,
output_layout="CHW",
crop=(crop, crop),
mirror=mirror,
)
return images, labels
will return feature whose shape is 1x16x14x14
not 1x192x14x14
. What
something wrong I did?
And how to random crop without decode (fn.decoders.image_random_crop()
)? I hope I can random_crop 192x16x16
to 192x14x14
...
Hi @Freed-Wu,
Regarding resize, int32 is not supported, you can convert it to float type first and then resize.
Regarding random cropping, you can use random operators as a source of anchors for slice/crop operator.
Also, crop converts the layout from HWC to CHW that is why you have your dimensions swapped. However crop doesn't support trimming channels, You can either use the slice
operator or tensor indexing.