Namuk Park
Namuk Park
Hi, I was inspired by ["Convolutional Self-Attention Networks"](https://arxiv.org/abs/1904.03107) [2], and implemented the two-dimensional `ConViT` model for vision tasks from scratch. Yang et al. [2] mainly proposed one-dimensional convolutional transformers for...
`Attention2d` in `models/attentions.py` is traditional _global_ self-attention. `ConvAttention2d` in `models/convit.py` is _convolutional_ self-attention, and it is a kind of _local_ self-attention. `ConvAttention2d` calculates self-attention only between tokens in convolutional receptive...
Yes. I'd really appreciate it if you would cite my paper.
@dinhanhx Oh! Sorry for the confusion. `Attention2d` in `models/attentions.py` is almost identical to traditional MSA in vanilla ViT, so I think you should cite the original ViT paper. Please cite...
@dinhanhx Ah, I think now I understand what you pointed out! I initially used _two Convs_ for `qkv` to improve the performance of [AlterNet](https://github.com/xxxnell/how-do-vits-work/blob/970b807a51a7a0014ced882dc3f0633e99566dda/models/alternet.py#L19-L52). So there was an experiment and...
@dinhanhx Right. I think what you said is one of the best practices.
Hi @longyuewangdcu , Thank you for the great paper and your kind words. And sorry I missed that implementation. I starred the repository, and I'll take a closer look!
Thank you for your support and insightful question! In our observation, the attributes of Conv depend primarily on the architecture or the group size (e.g., depthwise-separable Conv) rather than the...
Hi @Dong1P , Thank you for your support. I did not release the code for the Hessian eigenvalue spectra visualization (e.g., Fig 1c and 4) yet. Instead, I provide some...
Hi @yukimmmmiao , thank you for the kind words. I assumed that the largest Hessian values have a dominant influence on optimization ([Ghorbani, et al (ICML 2019)](https://arxiv.org/abs/1901.10159). See also [Liu...