333caowei comments

Repositories
Issues
Comments

Results 2 comments of


                                            333caowei

在论文Figure7中，关于如何分析不同attention层的作用

> 你好，两种方式我们均实验过 > > 1. 基于开源的IPA权重，仅仅在特定层注入。 > 2. 在开源数据上重新训练，但仅训练特定层。 > > 我们发现结果非常类似，因此未开源自己训练的权重。论文中的插图由第一种方式生成。很有意思的发现，我在dit的模型中，按照相同方式分别只在每个block注入IPA，但就很难有XL的这种风格现象

add model cache after loaded

looks great ! how to use it?