Ye

Results 7 comments of Ye

> Hello, > > I trained a model with the default parameters and also noticed the same issue. The pretrained model that is available from the link on the description...

That's true. Also, the value of B should be $2^{10}$

不存在一组非全0解

I had the same question. I think that it is because the @register_model decorator in models/vision_transformer.py. Refer to https://blog.csdn.net/weixin_47994925/article/details/129745845 and https://zhuanlan.zhihu.com/p/616239771.

True. In this pre-norm implementation, the procedure of norm & add in the FFN part should be the same as the self-attention part.