trl icon indicating copy to clipboard operation
trl copied to clipboard

Feature Request: String-Based Comparison Reward model for RLOOTrainer

Open HiroshigeAoki opened this issue 4 months ago • 0 comments

Feature request

Add an option to the RLOOTrainer that enables the use of string-based reward models, such as BLEU and Levenshtein distance, for evaluating model outputs.

Motivation

Currently, the reward_model in RLOOTrainer accepts tensor inputs only, limiting the ability to use string-based metrics for reward model. Incorporating string comparison metrics would allow users to leverage a broader range of string similarity measures.

Your contribution

I am open to collaborating with the community to implement this feature!

HiroshigeAoki avatar Oct 25 '24 07:10 HiroshigeAoki