vision icon indicating copy to clipboard operation
vision copied to clipboard

VideoReader with cuda backend fails with num_workers > 1

Open ganessh22 opened this issue 1 year ago • 2 comments

🐛 Describe the bug

I set the backend as CUDA. I have compiled torchvision master from source with ffmpeg 4.2.9 with nvenc.

import torchvision
torchvision.set_video_backend("cuda")

in my dataloader I have

vid_reader = torchvision.io.VideoReader(video_path, "video")

which causes this error immediately

    vid_reader = torchvision.io.VideoReader(video_path, "video")                                               
  File "/home/ganesh/vision/torchvision/io/video_reader.py", line 161, in __init__                             
    self._c = torch.classes.torchvision.GPUDecoder(src, device)                                                
RuntimeError: CUDA error: initialization error                                                                 
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.  

it works fine if in DataLoader I don't set num_workers i.e. num_workers=0.

dataloader = DataLoader(
    dataset,
    collate_fn=collate_fn,
    batch_size=2,
    # num_workers=2,
    # prefetch_factor=2,
    shuffle=True,
)

dataloader for reference as well. with num_workers and prefetch_factor commented out it works fine. Can Dataset classes not use the GPU inside the worker?

Versions

torchvision master with commit 3fb88b3ef1ee8107df74ca776cb57931fe3e9e1e pytorch is nightly as of 27Oct. ffmpeg 4.2.9 cuda 11.8

ganessh22 avatar Oct 27 '23 10:10 ganessh22

Hi @ganessh22 ,

Using cuda tensors within a multi-processing context is not really supported unfortuntately. See e.g. https://pytorch.org/docs/stable/data.html#multi-process-data-loading and the other resources linked from there

It is generally not recommended to return CUDA tensors in multi-process loading because of many subtleties in using CUDA and sharing CUDA tensors in multiprocessing

NicolasHug avatar Oct 27 '23 10:10 NicolasHug

Thank you. I will use it cautiously. I was able to get the above code working though with

import multiprocessing as mp
mp.set_start_method('spawn', force=True)

at the beginning of the code as fork will cause CUDA reinitialised error. For getting a fast video dataset I sadly see no other option unless I have much more CPU power and RAM.

ganessh22 avatar Oct 27 '23 16:10 ganessh22