Wu Chengyue comments

Results 29 comments of


                                            Wu Chengyue

trafficstars

Arxiv Data

I use the arxiv dataset as a subset of the proof-pile-2 dataset (https://huggingface.co/datasets/EleutherAI/proof-pile-2)

Code for training llama pro?

Yes, of course. I will organize the code recently. Thanks for your interest.

Code for training llama pro?

> Hey @hills-code ,could you also add code for converting a model by adding identity blocks for training ? I am excited to use similar techniques for other open-source models...

full code to continue pre-training

I have uploaded the training code to this repo. You can also check https://github.com/hills-code/open-instruct/tree/llama-pro.

1. interleave进行扩展我们参考的是Automated Progressive Learning for Efficient Training of Vision Transformers的图（c），这边扩展的位置不一定是最佳的，yi-9b和solar也使用了不一样的扩展位置，值得探索 2. 对于零初始化，之前有文章提出对于LN进行初始化，Staged training for transformer language models，我们发现在llama的设置下，LN设置为0会导致梯度为0无法训练，我们在论文中进行了分析，于是改为对down_proj和o_proj进行初始化 ![image](https://github.com/TencentARC/LLaMA-Pro/assets/60053707/0d89ad5c-5aa8-46a0-bdd9-dcec06f9635a)

Wu Chengyue

Arxiv Data

Code for training llama pro?

Code for training llama pro?

full code to continue pre-training

关于零初始化和扩展层的位置

关于零初始化和扩展层的位置

利用finetune_cosmopedia.sh脚本进行继续预训练中的数据集如何构建

关于运行流程

关于运行流程