Required GPU memory depends on the video length.

Open ysig opened this issue 2 years ago • 1 comments

I've managed to run run_tokenflow_pnp.py for a small excerpt of my video (5s) - and it looks really cool - but when I run it on the full one (5min) it crashes with CUDA OOM error even when I drop the batch size down to 1.

This scaling dependence on the video length probably caused by the extended attention seems like a major limitation of the method and is not highlighted neither in the discussion section nor somewhere else in the paper (as far as I can tell).

Is it possible to offload part of the attention computation to the CPU so that the number of frames is not a bottleneck?

Sep 26 '23 14:09 ysig

that's exactly what i did here https://github.com/omerbt/TokenFlow/issues/32 (in a way). it handled longer sequences but not unlimited ones, as required to feed that whole attention data back to GPU before denoising latents step (and i didn't manage to make it feedable in batches on that step)

Oct 03 '23 22:10 eps696