Hiroshige Aoki

Results 2 issues of Hiroshige Aoki

# Checklist: > [!IMPORTANT] > Please review the checklist below before submitting your pull request. - [x] Please open an issue before creating a PR or link to an existing...

🐞 bug
size:M

### Feature request Add an option to the RLOOTrainer that enables the use of string-based reward models, such as BLEU and Levenshtein distance, for evaluating model outputs. ### Motivation Currently,...

✨ enhancement
🏋 RLOO