nebuly
nebuly copied to clipboard
Add ToMe to the techniques used by Speedster
Description
ToMe (Token Merging) is an optimization technique recently released by Meta AI. It enables acceleration for Visual Transformers by merging tokens with very similar information, and thus speeding up attention.
After some analysis, it is shown that ToMe allows greater acceleration of models when using the CPU than has been achieved so far with Speedster for Visual Transformers. As for the GPU, there is improvement in ViT speed only when using high batch sizes.
I think it might be useful to implement among the different techniques used by Speedster also ToMe, this to make inference for this type of model on CPU even more efficient.
Useful materials
Analysis ToMe vs Speedster notebook Analysis ToMe vs Speedster blog Meta AI's blog Meta AI's paper Meta AI's repo