Lion
Lion copied to clipboard
Implementation Details about the Student Model
Hi Yuxin,
Thank you for your great work! In your paper you mentioned your method conducts 3 iterations to train, and in each iteration, you train the student model for 3 epochs using an AdamW optimizer with learning rate = 2e-5. I would like to clarify that, in each iteration for the student model, did you start with the same pre-trained LLaMA model, or start with the model trained in the last iteration? Thank you for your clarification!
Thanks for your interest in our work. In each iteration for the student model, we start with the model trained in the last iteration.