RL4LMs icon indicating copy to clipboard operation
RL4LMs copied to clipboard

A modular RL library to fine-tune language models to human preferences

Results 48 RL4LMs issues
Sort by recently updated
recently updated
newest added

Hi, I encountered this error when pip installing the rl4lm library using `pip install -e .` The message says > 'extras_require' must be a dictionary whose values are strings or...

I hope this message finds you well. I am writing to report an issue I encountered in the NLPO project that you maintain on GitHub. While executing the following block...

I'm trying to reproduce the results for NarrativeQA by directly running the command with the .yml configuration files. Below are the performances measured with ROUGE-L-Max. For PPO with supervision, I...

Hi all, I am encountering a gpu memory issue in metric evaluations. I am using the following metrics: ``` metrics: - id: meteor args: {} - id: rouge - id:...

Hi, thanks for publishing this awesome library. Can I add a configuration / modify the reward.py to return a vector instead of a scalar reward?

The repository uses transformers version 4.18, which does not support bloom, is there any way to use bloom as the initial policy for training?

Hello, I believe I found a minor bug in `IntentAccuracyDailyDialog`, lines 672-3 in `envs/text_generation/metric.py`. The device is currently set with the following two lines: ``` self._device = "cuda" if torch.cuda.is_available()...

In line104-105 of rl4lms/envs/text_generation/warm_start.py, an IndexError occurs if there exists filenames that do not contain "_", here is the crash: ``` key=lambda ckpt: int(ckpt.split("_")[1])) IndexError: list index out of range...

I am trying to load t5 base model as per t5_ppo config. Strangely this error pops out. Works fine for t5-small. ``` size mismatch for decoder.final_layer_norm.weight: copying a param with...

Dear contributors, Thank you so much! This repo is excellent! What is the difference between raw_logits, and processed_logits? How does it differ from the normal hugging face model.generate.score? Thank you,...