evit
evit copied to clipboard
Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations
我想训练一下EViT-LVViT-S,请问具体怎么实现? 1.关于token蒸馏,只蒸馏最后留下的token,还是蒸馏全部的token呢? 2.关于fuse token,需要蒸馏吗?
with torch.cuda.amp.autocast(): outputs = model(samples, keep_rate) loss = criterion(samples, outputs, targets) 这段代码是在train_one_epoch函数中调用的,你的model没有传token参数,按照你这个应该会报错的,请问你这个token是在哪里传过去的?
I found torch.no_grad() is missed in the speed_test.py, which significantly slows down the throughput. Is this intentional or a mistake?
i'm not sure whether the k will change every layer?
Thanks for your open source work. ![image](https://user-images.githubusercontent.com/38601195/174768831-3437727f-6089-4e7e-858a-3096232fb7a9.png) Following the finetune.sh configuration, I just use the batch_size of 8x256=2048, but the accuracy is only 79.35%.
Thanks for your code. I test my own model by the code you provide in `speed_test.py`, but I encounter this problem. https://github.com/youweiliang/evit/blob/97e58f610c51d4b74a070341739e41647dced32c/speed_test.py#L106 What's the reason for it? How to deal...
Hello, I would like to ask if the warmup strategy is not used, but the keep rate is directly set to the target value, will the experimental results differ greatly?
Hello, when I run `bash ./run_code.sh`, the following error has occurred, and how can I solve this problem? Traceback (most recent call last): File "main.py", line 497, in main(args) File...
Your research is very meaningful. But when I turn batchsize down, why doesn't EVit perform so well? I hope you can dispel my doubts.
In the paper, EViT with oracle can obtain higher accuracy when training longer epochs. Similar results are also shown in the DeiT paper. Thus I think the comparison is not...