Wu Chengyue

Results 29 comments of Wu Chengyue
trafficstars

I use the arxiv dataset as a subset of the proof-pile-2 dataset (https://huggingface.co/datasets/EleutherAI/proof-pile-2)

Yes, of course. I will organize the code recently. Thanks for your interest.

> Hey @hills-code ,could you also add code for converting a model by adding identity blocks for training ? I am excited to use similar techniques for other open-source models...

I have uploaded the training code to this repo. You can also check https://github.com/hills-code/open-instruct/tree/llama-pro.

1. interleave进行扩展我们参考的是Automated Progressive Learning for Efficient Training of Vision Transformers的图(c),这边扩展的位置不一定是最佳的,yi-9b和solar也使用了不一样的扩展位置,值得探索 2. 对于零初始化,之前有文章提出对于LN进行初始化,Staged training for transformer language models,我们发现在llama的设置下,LN设置为0会导致梯度为0无法训练,我们在论文中进行了分析,于是改为对down_proj和o_proj进行初始化 ![image](https://github.com/TencentARC/LLaMA-Pro/assets/60053707/0d89ad5c-5aa8-46a0-bdd9-dcec06f9635a)

我们参照了adapter的方式在输出的地方清零,在up的时候清零我没算过,不确定有没有梯度,你可以试一下

您可以参考huggingface dataset的官方文档读入txt文件:https://huggingface.co/docs/datasets/nlp_load

1. 这个文件其实就是增加了identity层的模型参数,相同的输入应该和之前的模型的输出一致 2. 有了这个文件以后,还需要修改模型的config里面的层数,保持和增加层之后的层数一致,我认为应该还是先做pretrain,然后再sft。因为新增加层的参数可能还需要一段时间sft让他适配一下,然后再sft比较合适

选择pt或者sft的参数的位置都在customized_trainer.py实现的,可以通过script里面的extend_layers来传新增加的layers,然后训练那部分参数