Visformer Pre-trained weights?

Hi, I want to extend the model on my own task, will you release pre-trained weights?

May 01 '21 02:05 hzhang57

Beacause of the ploicy of our institution, we cannot send the pre-trained models out directly. We plan to find some gpu servers outside, but it will take time. So we are afraid the models will not be released recently.

May 01 '21 04:05 danczs

Hi, I trained a model with the provided codes on ImageNet-1k only with 4x2080ti (batch100), finally reach 82.0 around. I upload this temporal alternative in google drive to facilate other's needs. https://drive.google.com/drive/folders/18GpH1SeVOsq3_2QGTA5Z_3O1UFtKugEu?usp=sharing I also guess that the model should have potential if pre-trained with ImageNet-21k.

May 10 '21 02:05 hzhang57

That's great! I will add it to readme for someone else need it. Thanks a lot!

May 11 '21 01:05 danczs

, I trained a model

Assuming this is Visformer small?

May 18 '21 02:05 amaarora

yes, I trained the visformer small with 224: visformer_small

May 18 '21 02:05 hzhang57

@danczs @amaarora

Thanks for sharing your works! I really love the architecture and experiments that you guys did. I could find out how to improve the performance of transformer models with convolutional layer.

I trained the visformer tiny with 224. If I upload the pretrained weight, will it can help other researchers? When I trained the visformer tiny, the top1 acc of this model reached 78.3% and reached 78.1% with the weight saved into last epoch.

Oct 01 '21 00:10 developer0hye

@danczs @amaarora

Thanks for sharing your works! I really love the architecture and experiments that you guys did. I could find out how to improve the performance of transformer models with convolutional layer.

I trained the visformer tiny with 224. If I upload the pretrained weight, will it can help other researchers? When I trained the visformer tiny, the top1 acc of this model reached 78.3% and reached 78.1% with the weight saved into last epoch.

Thanks for your attention! Now only the weights of Visformer-small are available. So I think tiny weights can be helpful for someone. By the way, for tiny model, setting '--drop-path=0.0' can slightly improve the performance.

Oct 01 '21 12:10 danczs

@danczs

I trained the model with the below command having set '--drop-path' to 0.

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model visformer_tiny --batch-size 256 --drop-path 0.0 --data-path /path/to/imagenet --output_dir /path/to/save

Please check my weight and share this link on Readme file!

https://drive.google.com/file/d/1LLBGbj7-ok1fDvvMCab-Fn5T3cjTzOKB/view?usp=sharing

Oct 01 '21 13:10 developer0hye

I have added it. Thanks for your sharing! In addition, we will slightly update the model in the next few days to enable Visformer to use amp. At that time, old weights may not work well. We will test it and report the result here. Thanks!

Oct 01 '21 13:10 danczs

@danczs Okay! Thanks!

Oct 01 '21 13:10 developer0hye

By slightly adjusting the model, Visformer can use amp now. During inference, old weights can utilize amp as well. One can refer to ReadMe for details.

Oct 12 '21 07:10 danczs

Visformer Visformer copied to clipboard

Pre-trained weights?

Visformer
Visformer copied to clipboard