trlx issues

!deepspeed examples/summarize_rlhf/sft/train_gptj_summarize.py is failing

8

**Bug** Hello , I am trying to run summarize_rlhf example using [this blog on wandb](https://wandb.ai/carperai/summarize_RLHF/reports/Implementing-RLHF-Learning-to-Summarize-with-trlX--VmlldzozMzAwODM2). This script is failing with attached logs; however I am not able to locate the...

MyBruso

Implement BoN for training and eval

5

Dahoas

Increasing max new tokens for generation arguments lead to errors

3

### 🐛 Describe the bug Here's my `TrainConfig`: ```python default_config = TRLConfig( train=TrainConfig( seq_length=512, epochs=10000, total_steps=10000, batch_size=8, checkpoint_interval=10000, eval_interval=500, pipeline="PromptPipeline", trainer="AcceleratePPOTrainer", checkpoint_dir="checkpoints/ppo_hh", ), model=ModelConfig(model_path="tiiuae/falcon-7b-instruct", num_layers_unfrozen=2), tokenizer=TokenizerConfig(tokenizer_path="tiiuae/falcon-7b-instruct", truncation_side="left"), optimizer=OptimizerConfig(name="adamw", kwargs=dict(lr=1e-6, betas=(0.9,...

wise-east

bug

Inference pipeline

1

Implementation of multi-generation RL in trlX Suggested (but optional) external inference pipeline wrapper can be found[ here](https://github.com/CarperAI/autocrit/pull/16)

Dahoas

Add support for Falcon 7B/40B

1

### 🚀 The feature, motivation, and pitch As Falcon 7B/40B have been dominating the open source LLM community, it would be amazing to use RLHF tuning on them. Are there...

cvetanovskaa

feature request

Memory occupy with multi GPUs Training

1

When I use trlx to fine-tune Flan-T5-Large with single GPU, the memory used is about 11GB; However, when I use accelerate for parallel training, the memory used is 4*16GB! I...

yuanyaaa

8-bit inference (#512)

13

glerzing

feat: support add tokens to tokenizer.

To improve the compatibility of various models initialized from different open-sourced models, people may want to add some tokens for better downstream tuning purposes. For example, to improve our policy's...

congchan

ILQL training batch2 tensor dimensions error

2

Hi, I'm trying an ILQL training with a gpt-j network trained with [this](https://github.com/CarperAI/trlx/blob/main/examples/summarize_rlhf/sft/train_gptj_summarize.py) code. I don't have this problem with the [original pre-trained net](https://huggingface.co/EleutherAI/gpt-j-6b), nor with a flan-xl. ``` Traceback...

GenVr

Direct Policy Optimization

4

### 🚀 The feature, motivation, and pitch Hey all! Appreciate the work. Is there any word on whether DPO [(direct policy optimization)](https://arxiv.org/abs/2305.18290) will be integrated into the trlx library soon?...

Reichenbachian

feature request

trlx
trlx copied to clipboard

Metadata

!deepspeed examples/summarize_rlhf/sft/train_gptj_summarize.py is failing

Implement BoN for training and eval

Increasing max new tokens for generation arguments lead to errors

Inference pipeline

Add support for Falcon 7B/40B

Memory occupy with multi GPUs Training

8-bit inference (#512)

feat: support add tokens to tokenizer.

ILQL training batch2 tensor dimensions error

Direct Policy Optimization

← Metadata

Owner

Metadata

trlx trlx copied to clipboard

Metadata

← Metadata

Owner

Metadata

trlx
trlx copied to clipboard