Chinese-LLaMA-Alpaca
Chinese-LLaMA-Alpaca copied to clipboard
Results of pre-training stage 1
Thank you for using the Issue submission template. Please follow the steps below to provide relevant information. We will prioritize issues with relatively complete information. Your cooperation is appreciated.
Hint: Fill in the [ ] with an x to mark it as checked. Delete any option that is not related to this issue.
Please check the following before asking
- [x] Due to frequent dependency updates, please ensure you have followed the steps in our Wiki
- [x] I have read the FAQ section AND searched for similar issues and did not find a similar problem or solution
- [x] Third-party plugin issues: e.g., llama.cpp, text-generation-webui, LlamaChat, we recommend checking the corresponding project for solutions
Choose the issue type
Base model:
- [x] LLaMA
- [ ] Alpaca
Issue type:
- [ ] Download issue
- [ ] Model conversion and merging issue
- [ ] Model inference issue (🤗 transformers)
- [ ] Model quantization and deployment issue (llama.cpp, text-generation-webui, LlamaChat)
- [ ] Performance issue
- [x] Other issues
Describe the issue in detail
Great work! Can I ask how were the results of pre-training stage 1? Can you elaborate more on why you chose to do both stages (instead of only stage 1 or only stage 2)? Thank you for your time!

Provide a screenshot or log of the issue
(If necessary) Please provide a text log or screenshot to help us better understand the issue details.
After PT stage-1, the CLM loss was about 5~6. (We forgot to train LM head in PT stage-1. We expect the loss can be much lower with a trainable LM head).
The two-phase training was our initial exploratory plan, however it is not guaranteed to be optimal.
As for the 13B model, we directly apply the PT stage 2 without PT stage 1, since training a 13B model is more expensive and we find the PT stage 2 is more training-efficient.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.
Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.
@airaria Can you please tell me how to perform the Pre-training stahe 1 using the file run_clm_pt_with_peft.py
?
Do we have to comment out the PEFT code, i.e., comment out:
-
PeftModel.from_pretrained(model, training_args.peft_path)
and -
get_peft_model(model, peft_config)
Do I understand correctly?