RL4LMs icon indicating copy to clipboard operation
RL4LMs copied to clipboard

A modular RL library to fine-tune language models to human preferences

Results 48 RL4LMs issues
Sort by recently updated
recently updated
newest added

I get the following error when running ```python scripts/training/train_text_generation.py --config_path scripts/training/task_configs/dialog/gpt2_ppo.yml```. I have double-checked that transformers==4.18.0. ``` Traceback (most recent call last): File "/Users/stephanehatgiskessell/Desktop/RL4LMs/scripts/training/train_text_generation.py", line 84, in main( File "/Users/stephanehatgiskessell/Desktop/RL4LMs/scripts/training/train_text_generation.py",...

In the README, it is mentioned that `Actor-Critic Policies supporting causal LMs (eg. GPT-2/3) and seq2seq LMs (eg. T5, BART)`. I was wondering how I can use GPT-2 model? I...

If I had trainied the model sucessfully with PPO method,how can I use it to inference?

Hi, I'm trying to use the Accelerate integration, because otherwise with NLPO I cannot run a small model (200M parameter) with 512 tokens length, not even in a 80GB A100....

Hi, great library! I'm wondering if you have any plans for deepspeed or accelerate integration to train larger models (e.g., GPT-J)?

Is there any end to end example to show the library should be used to train/finetune a language model?

hey, first of all thank you very much for this amazing library! I was using it to finetune a model, and I am interested in evaluating one of the saved...

The latest metrics loaded from huggingface such as rouge requires `rouge_score>=0.1.2`, but rl4lms 0.2.1 requires rouge_score==0.0.4, which is incompatible. And will cause errors when running the example in readme file.