RL4LMs
RL4LMs copied to clipboard
A modular RL library to fine-tune language models to human preferences
I am trying to get RL4LMs to work, and to achieve this, I've made the docker image using the instructions in the README file. After building the container, I tried...
Make sure transformer return past_key_values,The code will judge whether it is a past process based on past_key_values, Thus affecting the shape of position_ids and input_ids.When use_cache is true, transformer will...
My yaml: ``` tokenizer: model_name: facebook/bart-large-cnn padding_side: left truncation_side: left pad_token_as_eos_token: False reward_fn: id: rouge args: rouge_type: "rouge1" datapool: id: cnn_daily_mail args: prompt_prefix: "Summarize: " max_size: 500 env: n_envs: 1...
In running experiments on IMDB, I found that there was a very high variance in validation and test set results and I don't fully understand it, so I'm looking for...
Hi, I'm currently running the imdb experiments and trying to reproduce the PPO and NLPO results from the paper and though my PPO is close, NLPO is quite far from...
BLEURT reward function fails with `TypeError: cannot pickle '_thread.RLock' object` in multiprocessing environments. Probably because it can't pickle Tensorflow model to send to environment subprocess. Tested on both local and...
Hi, first of all, great work. This is a very useful library for research on RL and NLP. It will be very helpful if it's possible to add off-policy RL...
Hi, very great repo for students and researchers. May I ask is it possible to release the code based on Jax? Best
Hey, I am currently using your repo to finetune a Longformer model. The problem is this model requires to pre-define a global attention mask (in addition to the regular attention...