trl icon indicating copy to clipboard operation
trl copied to clipboard

TRL orpo gives everything Nan

Open gagan3012 opened this issue 11 months ago • 6 comments

{'loss': 0.0, 'grad_norm': nan, 'learning_rate': 2.5000000000000002e-08, 'rewards/chosen': nan, 'rewards/rejected': nan, 'rewards/accuracies': 0.0, 'rewards/margins': nan, 'logps/rejected': nan, 'logps/chosen': nan, 'logits/rejected': -3.114448070526123, 'logits/chosen': -3.114448070526123, 'nll_loss': nan, 'log_odds_ratio': nan, 'log_odds_chosen': nan, 'epoch': 0.0} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 5.0000000000000004e-08, 'rewards/chosen': nan, 'rewards/rejected': nan, 'rewards/accuracies': 0.0, 'rewards/margins': nan, 'logps/rejected': nan, 'logps/chosen': nan, 'logits/rejected': nan, 'logits/chosen': nan, 'nll_loss': nan, 'log_odds_ratio': nan, 'log_odds_chosen': nan, 'epoch': 0.0} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 7.500000000000001e-08, 'rewards/chosen': nan, 'rewards/rejected': nan, 'rewards/accuracies': 0.0, 'rewards/margins': nan, 'logps/rejected': nan, 'logps/chosen': nan, 'logits/rejected': nan, 'logits/chosen': nan, 'nll_loss': nan, 'log_odds_ratio': nan, 'log_odds_chosen': nan, 'epoch': 0.0} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 1.0000000000000001e-07, 'rewards/chosen': nan, 'rewards/rejected': nan, 'rewards/accuracies': 0.0, 'rewards/margins': nan, 'logps/rejected': nan, 'logps/chosen': nan, 'logits/rejected': nan, 'logits/chosen': nan, 'nll_loss': nan, 'log_odds_ratio': nan, 'log_odds_chosen': nan, 'epoch': 0.0} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 1.2500000000000002e-07, 'rewards/chosen': nan, 'rewards/rejected': nan, 'rewards/accuracies': 0.0, 'rewards/margins': nan, 'logps/rejected': nan, 'logps/chosen': nan, 'logits/rejected': nan, 'logits/chosen': nan, 'nll_loss': nan, 'log_odds_ratio': nan, 'log_odds_chosen': nan, 'epoch': 0.0} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 1.5000000000000002e-07, 'rewards/chosen': nan, 'rewards/rejected': nan, 'rewards/accuracies': 0.0, 'rewards/margins': nan, 'logps/rejected': nan, 'logps/chosen': nan, 'logits/rejected': nan, 'logits/chosen': nan, 'nll_loss': nan, 'log_odds_ratio': nan, 'log_odds_chosen': nan, 'epoch': 0.0} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 1.7500000000000002e-07, 'rewards/chosen': nan, 'rewards/rejected': nan, 'rewards/accuracies': 0.0, 'rewards/margins': nan, 'logps/rejected': nan, 'logps/chosen': nan, 'logits/rejected': nan, 'logits/chosen': nan, 'nll_loss': nan, 'log_odds_ratio': nan, 'log_odds_chosen': nan, 'epoch': 0.0} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 2.0000000000000002e-07, 'rewards/chosen': nan, 'rewards/rejected': nan, 'rewards/accuracies': 0.0, 'rewards/margins': nan, 'logps/rejected': nan, 'logps/chosen': nan, 'logits/rejected': nan, 'logits/chosen': nan, 'nll_loss': nan, 'log_odds_ratio': nan, 'log_odds_chosen': nan, 'epoch': 0.0} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 2.2500000000000002e-07, 'rewards/chosen': nan, 'rewards/rejected': nan, 'rewards/accuracies': 0.0, 'rewards/margins': nan, 'logps/rejected': nan, 'logps/chosen': nan, 'logits/rejected': nan, 'logits/chosen': nan, 'nll_loss': nan, 'log_odds_ratio': nan, 'log_odds_chosen': nan, 'epoch': 0.0}

using the example script orpo.py i get this error

gagan3012 avatar Mar 23 '24 03:03 gagan3012