RL4LMs
RL4LMs copied to clipboard
A modular RL library to fine-tune language models to human preferences
According to the source code of [class IntentAccuracyDailyDialog(BaseMetric)](https://github.com/allenai/RL4LMs/blob/97df0bd2f7406a906206c9610aea795fbf52884c/rl4lms/envs/text_generation/metric.py#L663), the intent likelihood of utterances on DailyDialog is computed by `rajkumarrrk/roberta-daily-dialog-intent-classifier`. However, according to the `config.json` of this classifier, it is used...
are there any plans to port the library to torch 2? Since the parallelize() library is deprecated in torch 2, it becomes impossible to train larger models like llama 7b...
Hey, Are there any plans to add support for mixed precision training? I did see in #12 a temporary solution was suggested, but it still throws multiple exceptions relating to...
Hi there, I'm having OOM errors when running the summarization example on a 80GB A100 (CUDA 11.8). I'm also getting some Tensorflow/TensorRT warnings, I'm wondering if it's related to that...
For example, if we ask the model to generate a program, rather than simply continuation. If we do not fine-tune them, RL does not even know what to generate I...
Do you have any plans to apply the recently published Reinforced Self-Training (ReST)? Reinforced Self-Training (ReST) for Language Modeling https://arxiv.org/abs/2308.08998
Numbeams
Tried to set num_beams parameter for generation to 3, but got an error config: ```python tokenizer: model_name: "t5-base" padding_side: right truncation_side: right truncation: True padding: True max_length: 128 # pad_token_as_eos_token:...
Previously, pip install (non-editable) failed to load certain packages such as algorithms. Adding __init__.py files fixes this.