RL4LMs issues

Results 48 RL4LMs issues

Sort by recently updated

Question about the classifier used for IntentAccuracyDailyDialog.

According to the source code of [class IntentAccuracyDailyDialog(BaseMetric)](https://github.com/allenai/RL4LMs/blob/97df0bd2f7406a906206c9610aea795fbf52884c/rl4lms/envs/text_generation/metric.py#L663), the intent likelihood of utterances on DailyDialog is computed by `rajkumarrrk/roberta-daily-dialog-intent-classifier`. However, according to the `config.json` of this classifier, it is used...

zhangjf-nlp

Upgrade to torch 2.0

are there any plans to port the library to torch 2? Since the parallelize() library is deprecated in torch 2, it becomes impossible to train larger models like llama 7b...

agastyaseth

Migrate to current version of gymnasium, SB3, and other libraries.

Kripner

how to stop env parallel multi-process to debug env.step()?

invoker-LL

Mix-Precision training

Hey, Are there any plans to add support for mixed precision training? I did see in #12 a temporary solution was suggested, but it still throws multiple exceptions relating to...

lovodkin93

OOM on summarization example

Hi there, I'm having OOM errors when running the summarization example on a 80GB A100 (CUDA 11.8). I'm also getting some Tensorflow/TensorRT warnings, I'm wondering if it's related to that...

gabrielhuang

Is PPO really better than SFT (in general)? under the condition of same amount of data

For example, if we ask the model to generate a program, rather than simply continuation. If we do not fine-tune them, RL does not even know what to generate I...

allanj

Do you have any plans to apply the recently published Reinforced Self-Training (ReST)?

Do you have any plans to apply the recently published Reinforced Self-Training (ReST)? Reinforced Self-Training (ReST) for Language Modeling https://arxiv.org/abs/2308.08998

missflash

Numbeams

Tried to set num_beams parameter for generation to 3, but got an error config: ```python tokenizer: model_name: "t5-base" padding_side: right truncation_side: right truncation: True padding: True max_length: 128 # pad_token_as_eos_token:...

tatiana-iazykova

bug

beam_search

Pip install fix

Previously, pip install (non-editable) failed to load certain packages such as algorithms. Adding __init__.py files fixes this.

kolbytn

RL4LMs
RL4LMs copied to clipboard

Metadata

Question about the classifier used for IntentAccuracyDailyDialog.

Upgrade to torch 2.0

Migrate to current version of gymnasium, SB3, and other libraries.

how to stop env parallel multi-process to debug env.step()?

Mix-Precision training

OOM on summarization example

Is PPO really better than SFT (in general)? under the condition of same amount of data

Do you have any plans to apply the recently published Reinforced Self-Training (ReST)?

Numbeams

Pip install fix

← Metadata

Owner

Metadata

RL4LMs RL4LMs copied to clipboard

Metadata

← Metadata

Owner

Metadata

RL4LMs
RL4LMs copied to clipboard