VidToMe
VidToMe copied to clipboard
About efficiency
Thanks for sharing this fascinating project! I tested the default demo with and without VidTome, and found the VidTome version to be slower. Is it normal?
Yes, it is normal. As we have two-stage token merging and unmerging operations around each self-attention module, they add computation overhead compared to processing each frame separately (w.o. VidToMe).
VidToMe improves efficiency against direct self-attention extension, which jointly processes tokens from all frames in the self-attention modules.