Visformer icon indicating copy to clipboard operation
Visformer copied to clipboard

Pre-trained weights?

Open hzhang57 opened this issue 3 years ago • 11 comments

Hi, I want to extend the model on my own task, will you release pre-trained weights?

hzhang57 avatar May 01 '21 02:05 hzhang57

Beacause of the ploicy of our institution, we cannot send the pre-trained models out directly. We plan to find some gpu servers outside, but it will take time. So we are afraid the models will not be released recently.

danczs avatar May 01 '21 04:05 danczs

Hi, I trained a model with the provided codes on ImageNet-1k only with 4x2080ti (batch100), finally reach 82.0 around. I upload this temporal alternative in google drive to facilate other's needs. https://drive.google.com/drive/folders/18GpH1SeVOsq3_2QGTA5Z_3O1UFtKugEu?usp=sharing I also guess that the model should have potential if pre-trained with ImageNet-21k.

hzhang57 avatar May 10 '21 02:05 hzhang57

That's great! I will add it to readme for someone else need it. Thanks a lot!

danczs avatar May 11 '21 01:05 danczs

, I trained a model

Assuming this is Visformer small?

amaarora avatar May 18 '21 02:05 amaarora

yes, I trained the visformer small with 224: visformer_small

hzhang57 avatar May 18 '21 02:05 hzhang57

@danczs @amaarora

Thanks for sharing your works! I really love the architecture and experiments that you guys did. I could find out how to improve the performance of transformer models with convolutional layer.

I trained the visformer tiny with 224. If I upload the pretrained weight, will it can help other researchers? When I trained the visformer tiny, the top1 acc of this model reached 78.3% and reached 78.1% with the weight saved into last epoch.

developer0hye avatar Oct 01 '21 00:10 developer0hye

@danczs @amaarora

Thanks for sharing your works! I really love the architecture and experiments that you guys did. I could find out how to improve the performance of transformer models with convolutional layer.

I trained the visformer tiny with 224. If I upload the pretrained weight, will it can help other researchers? When I trained the visformer tiny, the top1 acc of this model reached 78.3% and reached 78.1% with the weight saved into last epoch.

Thanks for your attention! Now only the weights of Visformer-small are available. So I think tiny weights can be helpful for someone. By the way, for tiny model, setting '--drop-path=0.0' can slightly improve the performance.

danczs avatar Oct 01 '21 12:10 danczs

@danczs

I trained the model with the below command having set '--drop-path' to 0.

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model visformer_tiny --batch-size 256 --drop-path 0.0 --data-path /path/to/imagenet --output_dir /path/to/save

Please check my weight and share this link on Readme file!

https://drive.google.com/file/d/1LLBGbj7-ok1fDvvMCab-Fn5T3cjTzOKB/view?usp=sharing

developer0hye avatar Oct 01 '21 13:10 developer0hye

I have added it. Thanks for your sharing! In addition, we will slightly update the model in the next few days to enable Visformer to use amp. At that time, old weights may not work well. We will test it and report the result here. Thanks!

danczs avatar Oct 01 '21 13:10 danczs

@danczs Okay! Thanks!

developer0hye avatar Oct 01 '21 13:10 developer0hye

By slightly adjusting the model, Visformer can use amp now. During inference, old weights can utilize amp as well. One can refer to ReadMe for details.

danczs avatar Oct 12 '21 07:10 danczs