trlx icon indicating copy to clipboard operation
trlx copied to clipboard

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Results 135 trlx issues
Sort by recently updated
recently updated
newest added

**Bug** Hello , I am trying to run summarize_rlhf example using [this blog on wandb](https://wandb.ai/carperai/summarize_RLHF/reports/Implementing-RLHF-Learning-to-Summarize-with-trlX--VmlldzozMzAwODM2). This script is failing with attached logs; however I am not able to locate the...

### 🐛 Describe the bug Here's my `TrainConfig`: ```python default_config = TRLConfig( train=TrainConfig( seq_length=512, epochs=10000, total_steps=10000, batch_size=8, checkpoint_interval=10000, eval_interval=500, pipeline="PromptPipeline", trainer="AcceleratePPOTrainer", checkpoint_dir="checkpoints/ppo_hh", ), model=ModelConfig(model_path="tiiuae/falcon-7b-instruct", num_layers_unfrozen=2), tokenizer=TokenizerConfig(tokenizer_path="tiiuae/falcon-7b-instruct", truncation_side="left"), optimizer=OptimizerConfig(name="adamw", kwargs=dict(lr=1e-6, betas=(0.9,...

bug

Implementation of multi-generation RL in trlX Suggested (but optional) external inference pipeline wrapper can be found[ here](https://github.com/CarperAI/autocrit/pull/16)

### 🚀 The feature, motivation, and pitch As Falcon 7B/40B have been dominating the open source LLM community, it would be amazing to use RLHF tuning on them. Are there...

feature request

When I use trlx to fine-tune Flan-T5-Large with single GPU, the memory used is about 11GB; However, when I use accelerate for parallel training, the memory used is 4*16GB! I...

To improve the compatibility of various models initialized from different open-sourced models, people may want to add some tokens for better downstream tuning purposes. For example, to improve our policy's...

Hi, I'm trying an ILQL training with a gpt-j network trained with [this](https://github.com/CarperAI/trlx/blob/main/examples/summarize_rlhf/sft/train_gptj_summarize.py) code. I don't have this problem with the [original pre-trained net](https://huggingface.co/EleutherAI/gpt-j-6b), nor with a flan-xl. ``` Traceback...

### 🚀 The feature, motivation, and pitch Hey all! Appreciate the work. Is there any word on whether DPO [(direct policy optimization)](https://arxiv.org/abs/2305.18290) will be integrated into the trlx library soon?...

feature request