FTVSR
FTVSR copied to clipboard
RuntimeError: CUDA out of memory.
Hi,
I've tested single gpu test in two different machines, one with a GPU 8GB, another is with A10 24 GB. Both gave me oom error even with num_input_frames = 2
. What am I missing?
First one with RTX said:
Tried to allocate 620.00 MiB (GPU 0; 7.80 GiB total capacity; 4.31 GiB already allocated; 412.31 MiB free; 6.33 GiB reserved in total by PyTorch)
while on another machine with A10 said:
Tried to allocate 5.50 GiB (GPU 0; 22.02 GiB total capacity; 10.77 GiB already allocated; 4.78 GiB free; 15.63 GiB reserved in total by PyTorch)
I changed REDS dataset to SRFolderMultipleGTDataset
and I've a 2 video subset of REDS4_val which looks like
-REDS4_short/ |-- val_sharp/ |--|-- 000/ |--|--|-- %08d.png |--|-- 001/ |--|--|-- %08d.png |-- val_sharp_bicubic/ |--|-- X4/ |--|--|-- 000/ |--|--|--|-- %08d.png |--|--|-- 001/ |--|--|--|-- %08d.png
And I've used the following command: tools/test.py <path/to/config> <path/to/redsModel>--crf 25 --startIdx 0 --test_frames 50
Let me know if you need any further info to help. Thanks!
After debugginf for a while, I've noticed that SPyNet compute_flow() method for the HR frames are failing due to memory. My bet is that for RTX 4000, it's failing at LR frames earlier than HR, so it gives a different amount of memory to allocate.
Looks like SPyNet uses all the frames set by --test_frames parameter to compute the flow, that's where we see OOM. I've set --test_frames 10
and it worked well on A10 (24G). My question here is that, is this behaviour of SPyNet is normal? How to run a video on 1000+ frames? I believe some batching approach should be taken because I'm not interested in first 10 frames, but all frames of the video. Thanks