Vim Much lower training efficiency

Thanks for your great works! However, I observe that the training efficiency (including the training speed and memory use) is much lower than that of the plain ViT with a similar mode size. Do you have any insights on this phenomenon?

Mar 28 '24 07:03 ydhongHIT

I think the bottleneck lies in the iterative generation of data by the CPU, which leads to low efficiency.

Mar 29 '24 09:03 yrqUni

Got the same issue, could anyone explain for this phenomenon? At what snippets does CPU matters?😢

Apr 02 '24 09:04 Leopold2333

Have you tried using different number of worker? It looks like smaller batch size (says 128 for a single GPU) and 16 cpu worker is fairly reasonable to me. I have tried training a vim-tiny on imagenet-1k with 4xV100 (16 G), amp enabled, it takes around 4 seconds to run 10 iterations and around 17 mins to finish 1 epoch. The gpu utilities are at 100%. Perhaps it has something to the gpu bandwidth.

Apr 15 '24 04:04 zhuqiangLu

Thanks for your great works! However, I observe that the training efficiency (including the training speed and memory use) is much lower than that of the plain ViT with a similar mode size. Do you have any insights on this phenomenon?

The code sets block number to 24 for small and tiny, which is two times as normal ViT/small,tiny. And I really can't understand why.

May 24 '24 03:05 jsrdcht

Have you tried using different number of worker? It looks like smaller batch size (says 128 for a single GPU) and 16 cpu worker is fairly reasonable to me. I have tried training a vim-tiny on imagenet-1k with 4xV100 (16 G), amp enabled, it takes around 4 seconds to run 10 iterations and around 17 mins to finish 1 epoch. The gpu utilities are at 100%. Perhaps it has something to the gpu bandwidth.

What does amp mean in this context? I can't find anything about it

Nov 25 '24 22:11 jorenterpstra

Have you tried using different number of worker? It looks like smaller batch size (says 128 for a single GPU) and 16 cpu worker is fairly reasonable to me. I have tried training a vim-tiny on imagenet-1k with 4xV100 (16 G), amp enabled, it takes around 4 seconds to run 10 iterations and around 17 mins to finish 1 epoch. The gpu utilities are at 100%. Perhaps it has something to the gpu bandwidth.

What does amp mean in this context? I can't find anything about it

Sorry for the confusion, amp is basically using Automatic Mixed Precision when training. In the training script, amp is disabled by default (at the end of the last line, it uses --no-amp) https://github.com/hustvl/Vim/blob/16e1d81360fc53f30ec20a550b82f1200a3f0352/vim/scripts/ft-vim-t.sh#L5C1-L5C513

Nov 25 '24 23:11 zhuqiangLu