mayank goyal comments

Repositories
Issues
Comments

Results 4 comments of


                                            mayank goyal

Qwen-7B support

any plans to support this on text generation inference

Above were training logs. Eval logs also looks the same. Rewards chosen is decreasing. "log_history": [ { "epoch": 0.09, "eval_logits/chosen": -0.8852691054344177, "eval_logits/rejected": -0.8777562379837036, "eval_logps/chosen": -28.69784927368164, "eval_logps/rejected": -106.26754760742188, "eval_loss": 0.2819069027900696, "eval_rewards/accuracies":...

Understanding loss

Information about model and training: Task: Question and Answering from context of documents. Architecture: Llama-2-7B-Chat Finetuning: Standard Lora finetuning on DPO loss. Beta was 0.1. Yes SFT was done on...

Understanding loss

@eric-mitchell let me know if my understanding is correct.