s2vs icon indicating copy to clipboard operation
s2vs copied to clipboard

How to deal with large videos

Open benemana opened this issue 4 months ago • 0 comments

Hi Giorgos, thanks again for this work and for your support in my previous issues.

After some experiments on custom videos, I'm struggling with calculating the similarity between large videos (>7 min), due to a limted amount of memory in my GPU.

In fact, when I try to process such videos, I get a "CUDA out of memory error".

I managed to overcome this issue in the features extraction part, by setting fps=1 and splitting the query and target videos into N chuncks, computing the features for each chunck and then stack together all the N features tensor into a single features tensor (does it make sense?).

But when it comes to the similarity part, specifically with the calculate_video_similarity function, I get the above error.

Do you have any suggestion on how to optimize the similarity part for such videos?

I guess that splitting the query and target videos into several chuncks and compute the similarity between chunks would not result in a meaningful similarity check, but maybe I'm wrong.

Thanks a lot.

EDIT: After further investigation, it seems that what is causing the error is the torch.einsum operation inside the frame_to_frame_similarity function.

benemana avatar Mar 06 '24 10:03 benemana