trlx
trlx copied to clipboard
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
### 🚀 The feature, motivation, and pitch Using tiny models for the tests may speed up the tests by a factor of 2 or 3, while still effectively verifying the...
### 🚀 The feature, motivation, and pitch I'm trying to apply RL in a code generation LM: https://huggingface.co/docs/transformers/model_doc/codegen, unfortunately getting the error below: ``` ValueError: Unsupported architecture: `CodeGenForCausalLM`. The following...
### 🚀 The feature, motivation, and pitch AccelerateRLTrainer.evaluate() logs a table of generated eval outputs and metrics to the metrics tracker. If I understand correctly, only scalar metrics are currently...
### 🚀 The feature, motivation, and pitch Very exciting to see you guys' remarkable effort for open source this repo! And I read through stablevicuna blog and notice that the...
### 🚀 The feature, motivation, and pitch Let's migrate to [`peft`](https://github.com/huggingface/peft). ##### Tasks Doing so will require the following updates: 1. Replace the `opendelta` setup in the `AccelerateBaseTrainer` with a...
### 🐛 Describe the bug When the `trlX` trainer makes a call to `model.generate` in the rollout phase, the process errors out with the following message: ```RuntimeError: probability tensor contains...
### 🚀 The feature, motivation, and pitch trlX uses HuggingFace accelerate under the hood. Accelerate has the capability to leverage Google's TPUs for faster training. I'm interested in supporting trlX...
### 🐛 Describe the bug When i run the code in summarization-rlhf using GPT2Chinese, the following error occurs . I have checked the "specail_tokens_map.json", it does have the "[PAD]" token....
https://github.com/CarperAI/trlx/blob/9fdd0d757e8f7a3d48e7edb060ddb7517da13d2d/trlx/trainer/accelerate_ppo_trainer.py#L399 I met that error when precompute logprobs, values due to the concatenation of prompt_tensor and output . my question is why we concat the prompt_tensor and output in the...