Wu Chengyue comments

Results 29 comments of


                                            Wu Chengyue

trafficstars

guide to run the code

1. You do not need pytorch_model.bin.index.json. For the other necessary files, you can just copy the original base model. 2. The code can directly load the dataset from the huggingface...

Thanks for wonderful projects ! Why I always got the results of apparent loss of original ability?

Hi! Have you tried to directly finetune llama-3-8B-instruct? What will happen in this setting? I did not carry out the experiments with llama-3 so maybe I am not very familiar...

Thanks for wonderful projects ! Why I always got the results of apparent loss of original ability?

Certainly! Here is the link to Yi-9B https://huggingface.co/01-ai/Yi-9B and its tech report https://arxiv.org/pdf/2403.04652 You can find the depth upscaling in the Sec 7.3 ![image](https://github.com/TencentARC/LLaMA-Pro/assets/60053707/9021206c-7192-43e4-bd42-05d3ea9b0833) and LLaMa3-120B https://huggingface.co/alpindale/goliath-120b

Training on arbitary data

Yes, you can directly finetune the 8B model with any datasets. You can access the model in the huggingface (https://huggingface.co/TencentARC/LLaMA-Pro-8B). You can use it just like the normal LLaMA model.

Pretrain code of Mistral-Pro-8B-v0.1

It is expected to be released by this month! Thanks for your attention!

Comparison with PEFT

Thanks for your attention! I think the main difference between our work and PEFT methods is that we scale the parameters. We have observed the power of scaling like GPT,...

请教下训练的显存需求

这取决于您添加的层数，以及训练的设置，根据我的经验8卡A100-40G是能够支持ctx-length=4096的预训练的，我试过将LoRA的rank调大到1024，使得lora和我们可训练的参数量相近，此时显存占用也是差不多的

请教下训练的显存需求

是的，但是如果新增加的要训练的层很多，同样也会带来很大的显存占用，并且训练的时候其实原有模型的参数也需要load进去，尽管不需要微调

更大的模型需要更多的block吗？

我们也在探索更大的模型，不过这样的实验很需要资源，目前为止我们探索了在不同架构，如mistral上的扩展，取得了一定的效果，如[Mistral-Pro](https://huggingface.co/TencentARC/Mistral_Pro_8B_v0.1)，后续我们也会进一步探索这方面的idea。我们发现yi也最近使用深度扩展进行了数学代码的训练，[Yi-9B](https://www.qbitai.com/2024/03/126184.html)，他扩展了16层，我相信复制的位置，复制的层数，还是有很多值得研究的地方，我们会逐步研究的。

add `pip install fire` to requirements.txt

Thanks!