axial-deeplab
axial-deeplab copied to clipboard
re-implement Stand-Alone Self-Attention model
Hi, @csrhddlam As we discussed before, I am trying to re-implement the baseline "Conv-stem+Attention" in Stand-Alone Self-Attention in Vision Models, which is referred in your paper. Could you please help check the correctness? It will be better if you could provide further optimization of this implementation. Thanks!
Hi, @d-li14 Thanks for contributing. It looks correct to me, but the unfolding implementation could take a lot of memory. Could you check if the model really runs on 224x224 images and if it can reproduce the results in the paper? Thanks!
Yes, it is very memory-consuming, a simple test shows more than 7G memory is used with 8 images per GPU. I will try to verify the accuracy of this model.