Chinese-LLaMA-Alpaca icon indicating copy to clipboard operation
Chinese-LLaMA-Alpaca copied to clipboard

Results of pre-training stage 1

Open larrylawl opened this issue 1 year ago • 2 comments

Thank you for using the Issue submission template. Please follow the steps below to provide relevant information. We will prioritize issues with relatively complete information. Your cooperation is appreciated.

Hint: Fill in the [ ] with an x to mark it as checked. Delete any option that is not related to this issue.

Please check the following before asking

  • [x] Due to frequent dependency updates, please ensure you have followed the steps in our Wiki
  • [x] I have read the FAQ section AND searched for similar issues and did not find a similar problem or solution
  • [x] Third-party plugin issues: e.g., llama.cpp, text-generation-webui, LlamaChat, we recommend checking the corresponding project for solutions

Choose the issue type

Base model:

  • [x] LLaMA
  • [ ] Alpaca

Issue type:

  • [ ] Download issue
  • [ ] Model conversion and merging issue
  • [ ] Model inference issue (🤗 transformers)
  • [ ] Model quantization and deployment issue (llama.cpp, text-generation-webui, LlamaChat)
  • [ ] Performance issue
  • [x] Other issues

Describe the issue in detail

Great work! Can I ask how were the results of pre-training stage 1? Can you elaborate more on why you chose to do both stages (instead of only stage 1 or only stage 2)? Thank you for your time!

Screenshot 2023-04-24 at 10 14 08 PM

Provide a screenshot or log of the issue

(If necessary) Please provide a text log or screenshot to help us better understand the issue details.

larrylawl avatar Apr 24 '23 14:04 larrylawl

After PT stage-1, the CLM loss was about 5~6. (We forgot to train LM head in PT stage-1. We expect the loss can be much lower with a trainable LM head).

The two-phase training was our initial exploratory plan, however it is not guaranteed to be optimal.

As for the 13B model, we directly apply the PT stage 2 without PT stage 1, since training a 13B model is more expensive and we find the PT stage 2 is more training-efficient.

airaria avatar Apr 24 '23 15:04 airaria

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions[bot] avatar May 02 '23 00:05 github-actions[bot]

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

github-actions[bot] avatar May 07 '23 00:05 github-actions[bot]

@airaria Can you please tell me how to perform the Pre-training stahe 1 using the file run_clm_pt_with_peft.py?

Do we have to comment out the PEFT code, i.e., comment out:

  1. PeftModel.from_pretrained(model, training_args.peft_path) and
  2. get_peft_model(model, peft_config)

Do I understand correctly?

adeepak7 avatar Nov 20 '23 17:11 adeepak7