axial-deeplab icon indicating copy to clipboard operation
axial-deeplab copied to clipboard

re-implement Stand-Alone Self-Attention model

Open d-li14 opened this issue 4 years ago • 2 comments

Hi, @csrhddlam As we discussed before, I am trying to re-implement the baseline "Conv-stem+Attention" in Stand-Alone Self-Attention in Vision Models, which is referred in your paper. Could you please help check the correctness? It will be better if you could provide further optimization of this implementation. Thanks!

d-li14 avatar Sep 06 '20 13:09 d-li14

Hi, @d-li14 Thanks for contributing. It looks correct to me, but the unfolding implementation could take a lot of memory. Could you check if the model really runs on 224x224 images and if it can reproduce the results in the paper? Thanks!

csrhddlam avatar Sep 06 '20 15:09 csrhddlam

Yes, it is very memory-consuming, a simple test shows more than 7G memory is used with 8 images per GPU. I will try to verify the accuracy of this model.

d-li14 avatar Sep 06 '20 16:09 d-li14