Namuk Park comments

Results 17 comments of


                                            Namuk Park

In convit.py file, where does ConVit come from, really?

Hi, I was inspired by ["Convolutional Self-Attention Networks"](https://arxiv.org/abs/1904.03107) [2], and implemented the two-dimensional `ConViT` model for vision tasks from scratch. Yang et al. [2] mainly proposed one-dimensional convolutional transformers for...

In convit.py file, where does ConVit come from, really?

`Attention2d` in `models/attentions.py` is traditional _global_ self-attention. `ConvAttention2d` in `models/convit.py` is _convolutional_ self-attention, and it is a kind of _local_ self-attention. `ConvAttention2d` calculates self-attention only between tokens in convolutional receptive...

In convit.py file, where does ConVit come from, really?

Yes. I'd really appreciate it if you would cite my paper.

In convit.py file, where does ConVit come from, really?

@dinhanhx Oh! Sorry for the confusion. `Attention2d` in `models/attentions.py` is almost identical to traditional MSA in vanilla ViT, so I think you should cite the original ViT paper. Please cite...

In convit.py file, where does ConVit come from, really?

@dinhanhx Ah, I think now I understand what you pointed out! I initially used _two Convs_ for `qkv` to improve the performance of [AlterNet](https://github.com/xxxnell/how-do-vits-work/blob/970b807a51a7a0014ced882dc3f0633e99566dda/models/alternet.py#L19-L52). So there was an experiment and...

In convit.py file, where does ConVit come from, really?

@dinhanhx Right. I think what you said is one of the best practices.

In convit.py file, where does ConVit come from, really?

Hi @longyuewangdcu , Thank you for the great paper and your kind words. And sorry I missed that implementation. I starred the repository, and I'll take a closer look!

what is the attributes in the large-kernel CNN

Thank you for your support and insightful question! In our observation, the attributes of Conv depend primarily on the architecture or the group size (e.g., depthwise-separable Conv) rather than the...

How to plot the Hessian max eigenvalue spectra?

Hi @Dong1P , Thank you for your support. I did not release the code for the Hessian eigenvalue spectra visualization (e.g., Fig 1c and 4) yet. Instead, I provide some...

How to plot the Hessian max eigenvalue spectra?

Hi @yukimmmmiao , thank you for the kind words. I assumed that the largest Hessian values have a dominant influence on optimization ([Ghorbani, et al (ICML 2019)](https://arxiv.org/abs/1901.10159). See also [Liu...