RL4LMs icon indicating copy to clipboard operation
RL4LMs copied to clipboard

A modular RL library to fine-tune language models to human preferences

Results 48 RL4LMs issues
Sort by recently updated
recently updated
newest added

According to the source code of [class IntentAccuracyDailyDialog(BaseMetric)](https://github.com/allenai/RL4LMs/blob/97df0bd2f7406a906206c9610aea795fbf52884c/rl4lms/envs/text_generation/metric.py#L663), the intent likelihood of utterances on DailyDialog is computed by `rajkumarrrk/roberta-daily-dialog-intent-classifier`. However, according to the `config.json` of this classifier, it is used...

are there any plans to port the library to torch 2? Since the parallelize() library is deprecated in torch 2, it becomes impossible to train larger models like llama 7b...

Hey, Are there any plans to add support for mixed precision training? I did see in #12 a temporary solution was suggested, but it still throws multiple exceptions relating to...

Hi there, I'm having OOM errors when running the summarization example on a 80GB A100 (CUDA 11.8). I'm also getting some Tensorflow/TensorRT warnings, I'm wondering if it's related to that...

For example, if we ask the model to generate a program, rather than simply continuation. If we do not fine-tune them, RL does not even know what to generate I...

Do you have any plans to apply the recently published Reinforced Self-Training (ReST)? Reinforced Self-Training (ReST) for Language Modeling https://arxiv.org/abs/2308.08998

Tried to set num_beams parameter for generation to 3, but got an error config: ```python tokenizer: model_name: "t5-base" padding_side: right truncation_side: right truncation: True padding: True max_length: 128 # pad_token_as_eos_token:...

bug
beam_search

Previously, pip install (non-editable) failed to load certain packages such as algorithms. Adding __init__.py files fixes this.