askerlee

Results 19 issues of askerlee

Hi Phil, thanks for the great repo. I compared your implementation of ViT with huggingface's (https://github.com/huggingface/transformers/blob/master/src/transformers/models/vit/modeling_vit.py) and found some subtle differences. In particular: 1) in the attention module, huggingface's vit...

We know that two linear transformations in a row can be merged into one linear transformation, if there's no activation function between them. In https://github.com/microsoft/Swin-Transformer/blob/main/models/swin_transformer.py#L141-L142 ``` x = (attn @...

So that people can play with your model 😋

eval_all.m assumes there are 4 GPUs on the user's PC. However when there are < 4 GPUs, matcaffe will crash. At line 61: `gpu_id_array=[0:3]; %multi-GPUs for parfor testing` The array...

In the "Improved Denoising Diffusion Probabilistic Models" paper, the authors claim that cosine schedule of beta makes alpha-bar change more smoothly, leading to better results. Then I wonder why not...

Thank you for this wonderful work! May I know what's the benefit of using DropPath during training? Have you done any ablation study?

Hi Zach, In your paper you evaluated the lookup radius r from 1 to 4. Have you tried larger values, say 6 or 8? May I know what's the consideration...

Hi, I'm using ResNet101_BN_SCALE_Merged_OHEM on my own dataset. Some of the output losses (loss_bbox and loss_cls) are always 0. Update: seems there are something wrong with OHEM. When I turn...

Just came across your paper, and found that the formulation of co-attention is quiote similar to transformers: ![image](https://user-images.githubusercontent.com/1575461/189848213-203d8f4c-2664-4c59-966c-86433376bc3f.png) Especially, a few (but not all) major ingredients, i.e., Q, V projections,...

Thanks for this nice list. However seems it stops updating since 3 years ago. Wonder has the author been less interested in this topic? I'm recently quite interested in it,...