RL4LMs issues

Results 48 RL4LMs issues

Sort by recently updated

Pip install error with gym and torch

Hi, I encountered this error when pip installing the rl4lm library using `pip install -e .` The message says > 'extras_require' must be a dictionary whose values are strings or...

BaleChen

NLPO Code Error and Query About gymnasium vs gym Usage

I hope this message finds you well. I am writing to report an issue I encountered in the NLPO project that you maintain on GitHub. While executing the following block...

jinyilun718

Reproducing existing results on NarrativeQA

I'm trying to reproduce the results for NarrativeQA by directly running the command with the .yml configuration files. Below are the performances measured with ROUGE-L-Max. For PPO with supervision, I...

yxk23

Memory issue in metric evals?

Hi all, I am encountering a gpu memory issue in metric evaluations. I am using the following metrics: ``` metrics: - id: meteor args: {} - id: rouge - id:...

AnujMahajanOxf

is multi-dimensional reward supported?

Hi, thanks for publishing this awesome library. Can I add a configuration / modify the reward.py to return a vector instead of a scalar reward?

zabir-nabil

Bloom Supporting

The repository uses transformers version 4.18, which does not support bloom, is there any way to use bloom as the initial policy for training?

c-box

CPU Support Minor Bug

Hello, I believe I found a minor bug in `IntentAccuracyDailyDialog`, lines 672-3 in `envs/text_generation/metric.py`. The device is currently set with the following two lines: ``` self._device = "cuda" if torch.cuda.is_available()...

tedmoskovitz

Fix IndexError when loading checkpoints

In line104-105 of rl4lms/envs/text_generation/warm_start.py, an IndexError occurs if there exists filenames that do not contain "_", here is the crash: ``` key=lambda ckpt: int(ckpt.split("_")[1])) IndexError: list index out of range...

Runingtime

Bug while loading t5 base model

I am trying to load t5 base model as per t5_ppo config. Strangely this error pops out. Works fine for t5-small. ``` size mismatch for decoder.final_layer_norm.weight: copying a param with...

Sahajtomar

model.generate.scores returning two scores

Dear contributors, Thank you so much! This repo is excellent! What is the difference between raw_logits, and processed_logits? How does it differ from the normal hugging face model.generate.score? Thank you,...

debjitpaul

RL4LMs
RL4LMs copied to clipboard

Metadata

Pip install error with gym and torch

NLPO Code Error and Query About gymnasium vs gym Usage

Reproducing existing results on NarrativeQA

Memory issue in metric evals?

is multi-dimensional reward supported?

Bloom Supporting

CPU Support Minor Bug

Fix IndexError when loading checkpoints

Bug while loading t5 base model

model.generate.scores returning two scores

← Metadata

Owner

Metadata

RL4LMs RL4LMs copied to clipboard

Metadata

← Metadata

Owner

Metadata

RL4LMs
RL4LMs copied to clipboard