RL4LMs issues

Results 48 RL4LMs issues

Sort by recently updated

_pickle.UnpicklingError: pickle data was truncated

I am trying to get RL4LMs to work, and to achieve this, I've made the docker image using the instructions in the README file. After building the container, I tried...

Oxtay

Make sure transformer return past_key_values

Make sure transformer return past_key_values,The code will judge whether it is a past process based on past_key_values, Thus affecting the shape of position_ids and input_ids.When use_cache is true, transformer will...

DvHuang

Value is not broadcastable with batch_shape+event_shape

My yaml: ``` tokenizer: model_name: facebook/bart-large-cnn padding_side: left truncation_side: left pad_token_as_eos_token: False reward_fn: id: rouge args: rouge_type: "rouge1" datapool: id: cnn_daily_mail args: prompt_prefix: "Summarize: " max_size: 500 env: n_envs: 1...

vcvcvnvcvcvn

Persistent Variance in IMDB

In running experiments on IMDB, I found that there was a very high variance in validation and test set results and I don't fully understand it, so I'm looking for...

mnoukhov

Reproducing IMDB results

Hi, I'm currently running the imdb experiments and trying to reproduce the PPO and NLPO results from the paper and though my PPO is close, NLPO is quite far from...

mnoukhov

fix: OnPolicyAlgorithm doesnot have the parameter: create_eval_env

hscspring

Problem with BLEURT reward function

BLEURT reward function fails with `TypeError: cannot pickle '_thread.RLock' object` in multiprocessing environments. Probably because it can't pickle Tensorflow model to send to environment subprocess. Tested on both local and...

eublefar

Off-policy RL algorithms support

Hi, first of all, great work. This is a very useful library for research on RL and NLP. It will be very helpful if it's possible to add off-policy RL...

Div99

enhancement

help wanted

Is it possible to release the code based on Jax

Hi, very great repo for students and researchers. May I ask is it possible to release the code based on Jax? Best

sglucas

passing extra variable to the forward function

Hey, I am currently using your repo to finetune a Longformer model. The problem is this model requires to pre-define a global attention mask (in addition to the regular attention...

lovodkin93

RL4LMs
RL4LMs copied to clipboard

Metadata

_pickle.UnpicklingError: pickle data was truncated

Make sure transformer return past_key_values

Value is not broadcastable with batch_shape+event_shape

Persistent Variance in IMDB

Reproducing IMDB results

fix: OnPolicyAlgorithm doesnot have the parameter: create_eval_env

Problem with BLEURT reward function

Off-policy RL algorithms support

Is it possible to release the code based on Jax

passing extra variable to the forward function

← Metadata

Owner

Metadata

RL4LMs RL4LMs copied to clipboard

Metadata

← Metadata

Owner

Metadata

RL4LMs
RL4LMs copied to clipboard