RL4LMs issues

Results 48 RL4LMs issues

Sort by recently updated

'GPT2Model' object has no attribute 'first_device'

I get the following error when running ```python scripts/training/train_text_generation.py --config_path scripts/training/task_configs/dialog/gpt2_ppo.yml```. I have double-checked that transformers==4.18.0. ``` Traceback (most recent call last): File "/Users/stephanehatgiskessell/Desktop/RL4LMs/scripts/training/train_text_generation.py", line 84, in main( File "/Users/stephanehatgiskessell/Desktop/RL4LMs/scripts/training/train_text_generation.py",...

Stephanehk

Using GPT-2

In the README, it is mentioned that `Actor-Critic Policies supporting causal LMs (eg. GPT-2/3) and seq2seq LMs (eg. T5, BART)`. I was wondering how I can use GPT-2 model? I...

oroojlooy

How can I inference data with the model after PPO training?

If I had trainied the model sucessfully with PPO method,how can I use it to inference?

RyanYip-Kat

Error with Accelerate integration + NLPO

Hi, I'm trying to use the Accelerate integration, because otherwise with NLPO I cannot run a small model (200M parameter) with 512 tokens length, not even in a 80GB A100....

avacaondata

Any plans for Deepspeed/Accelerate integration?

Hi, great library! I'm wondering if you have any plans for deepspeed or accelerate integration to train larger models (e.g., GPT-J)?

Breakend

[Question] End-to-end example

Is there any end to end example to show the library should be used to train/finetune a language model?

farrokhsiar

In the paper, what is the detail setting of supervised learning? Is SL has additional supervised data?

https://openreview.net/forum?id=8aHzds2uUyB Thank you very much!

guotong1988

A question bother me a long time: What is the difference between RL-for-text-generation and delete-0-reward-model-predictions?

For text gereration. Thank you very much!

guotong1988

Evaluating a specific checkpoint

hey, first of all thank you very much for this amazing library! I was using it to finetune a model, and I am interested in evaluating one of the saved...

lovodkin93

Metric version incompatible

The latest metrics loaded from huggingface such as rouge requires `rouge_score>=0.1.2`， but rl4lms 0.2.1 requires rouge_score==0.0.4, which is incompatible. And will cause errors when running the example in readme file.

c-box

RL4LMs
RL4LMs copied to clipboard

Metadata

'GPT2Model' object has no attribute 'first_device'

Using GPT-2

How can I inference data with the model after PPO training?

Error with Accelerate integration + NLPO

Any plans for Deepspeed/Accelerate integration?

[Question] End-to-end example

In the paper, what is the detail setting of supervised learning? Is SL has additional supervised data?

A question bother me a long time: What is the difference between RL-for-text-generation and delete-0-reward-model-predictions?

Evaluating a specific checkpoint

Metric version incompatible

← Metadata

Owner

Metadata

RL4LMs RL4LMs copied to clipboard

Metadata

← Metadata

Owner

Metadata

RL4LMs
RL4LMs copied to clipboard