DALI
DALI copied to clipboard
OOM occurs
Hello, this task is video classification training, the training set is 240,000 4-second video clips, and dali is used for video loading. The memory of the training host is 500G. During the training process, the memory usage continues to increase to 96% and then stabilizes. If the training set is boosted to 1 million, OOM occurs directly. Why does dali occupy so much memory during training and how to avoid OOM? Below is the memory usage curve during training on the 240,000 dataset.
class DALILoader(): def init(self, batch_size, file_list, sequence_length, step, stride, crop_size, device_id, mode):
if mode == 'train':
self.pipeline = self._create_video_reader_pipeline_train(batch_size=batch_size,
device_id=device_id,
num_threads=8,
file_list=file_list,
sequence_length=sequence_length,
step=step,
stride=stride,
crop_size=crop_size)
else:
self.pipeline = self._create_video_reader_pipeline_infer(batch_size=batch_size,
device_id=device_id,
num_threads=8,
file_list=file_list,
sequence_length=sequence_length,
step=step,
stride=stride,
crop_size=crop_size)
self.pipeline.build()
self.epoch_size = self.pipeline.epoch_size("Reader")
self.dali_iterator = pytorch.DALIGenericIterator(self.pipeline,
["data", "label"],
reader_name="Reader",
auto_reset=True,
last_batch_policy=pytorch.LastBatchPolicy.FILL,
last_batch_padded=False)
@pipeline_def
def _create_video_reader_pipeline_train(self, file_list, sequence_length, step, stride, crop_size):
images, labels = fn.readers.video(device="gpu",
file_list=file_list,
sequence_length=sequence_length,
step=step,
stride=stride,
normalized=False,
random_shuffle=True,
image_type=types.RGB,
dtype=types.FLOAT,
initial_fill=1024,
pad_last_batch=True,
name="Reader")
# images = fn.resize(images, resize_x=398, resize_y=224)
images = fn.crop(images, crop=crop_size, dtype=types.FLOAT,
crop_pos_x=fn.random.uniform(range=(0.1, 0.9)),
crop_pos_y=1)
return images, labels
@pipeline_def
def _create_video_reader_pipeline_infer(self, file_list, sequence_length, step, stride, crop_size):
images, labels = fn.readers.video(device="gpu",
file_list=file_list,
sequence_length=sequence_length,
step=step,
stride=stride,
normalized=False,
random_shuffle=True,
image_type=types.RGB,
dtype=types.FLOAT,
initial_fill=1024,
pad_last_batch=True,
name="Reader")
# images = fn.resize(images, resize_x=398, resize_y=224)
images = fn.crop(images, crop=crop_size, dtype=types.FLOAT,
crop_pos_x=0.5,
crop_pos_y=1)
return images, labels
def __len__(self):
return int(self.epoch_size)
def __iter__(self):
return self.dali_iterator.__iter__()
Hi @zhanghang-cv,
For now, DALI creates a libaviutil context for each video in the dataset - see https://github.com/NVIDIA/DALI/issues/2220 for more details. So in your case, if you have 1 million videos it can consume 10 GB of CPU RAM. We can think about the tradeoff between recreating the context each time video is needed and keeping it for later to speed up the decoding process. The creation of context is not free and can impact the overall decoding speed, accounting for the fact that DALI composes batches from sequences or randomly picked samples from any video in the dataset. The only solution that comes to my mind is to cache only an N of contexts and free the last recently used here.
Thank you for your reply. What was not stated before is that this task uses 8 pipelines for data loading, and the large memory usage is understandable. Below is an up-to-date memory usage curve (OOM). The process I understand is divided into three stages, the first stage pipeline is initialized, the second stage preloads data, and the second stage ends to get the first batch of data. The third stage starts looping to obtain batches of data for training. We observed that each time a batch of data is loaded in the third stage, the memory will slowly increase (this will cause OOM). What is the reason for this, will each batch of data be saved in memory after loading?
Hi @zhanghang-cv,
How exactly do you measure memory consumption? Can the increase come from the fact that OS is caching the data in RAM when accessing it from the drive?
This situation is also possible. I want to make sure that after loading each batch of data, dali will save the data to memory for the next call?
DALI doesn't use RAM to store decoded video for the GPU video decoder, so it doesn't seem to be the reason.