Max Khanov
Max Khanov
Thanks so much for the quick reply! I'm pretty sure we did SFT the model before we did RM. Our SFT train loss was about 1.539 and the test loss...
Also would it be possible to share the checkpoint files for the LLaMA-SFT-7B or the reward model?
@WeiXiongUST Any updates?
Thanks so much for following up @WeiXiongUST, we used Lora for all the steps (sft, and rm)
Also is Lora used during the SFT training?
Thanks so so much, we'll look into this!