RL4LMs icon indicating copy to clipboard operation
RL4LMs copied to clipboard

A modular RL library to fine-tune language models to human preferences

Results 48 RL4LMs issues
Sort by recently updated
recently updated
newest added

I am trying to get RL4LMs to work, and to achieve this, I've made the docker image using the instructions in the README file. After building the container, I tried...

Make sure transformer return past_key_values,The code will judge whether it is a past process based on past_key_values, Thus affecting the shape of position_ids and input_ids.When use_cache is true, transformer will...

My yaml: ``` tokenizer: model_name: facebook/bart-large-cnn padding_side: left truncation_side: left pad_token_as_eos_token: False reward_fn: id: rouge args: rouge_type: "rouge1" datapool: id: cnn_daily_mail args: prompt_prefix: "Summarize: " max_size: 500 env: n_envs: 1...

In running experiments on IMDB, I found that there was a very high variance in validation and test set results and I don't fully understand it, so I'm looking for...

Hi, I'm currently running the imdb experiments and trying to reproduce the PPO and NLPO results from the paper and though my PPO is close, NLPO is quite far from...

BLEURT reward function fails with `TypeError: cannot pickle '_thread.RLock' object` in multiprocessing environments. Probably because it can't pickle Tensorflow model to send to environment subprocess. Tested on both local and...

Hi, first of all, great work. This is a very useful library for research on RL and NLP. It will be very helpful if it's possible to add off-policy RL...

enhancement
help wanted

Hi, very great repo for students and researchers. May I ask is it possible to release the code based on Jax? Best

Hey, I am currently using your repo to finetune a Longformer model. The problem is this model requires to pre-define a global attention mask (in addition to the regular attention...