trl icon indicating copy to clipboard operation
trl copied to clipboard

Error while running base script available in README.md

Open prakamya-mishra opened this issue 2 years ago • 1 comments

Hi,

I am trying to run the base script available on the readme file:

import torch
from transformers import AutoTokenizer
from trl import PPOTrainer, PPOConfig, AutoModelForCausalLMWithValueHead, create_reference_model
from trl.core import respond_to_batch

# get models
model = AutoModelForCausalLMWithValueHead.from_pretrained('gpt2')
model_ref = create_reference_model(model)

tokenizer = AutoTokenizer.from_pretrained('gpt2')
tokenizer.pad_token = tokenizer.eos_token 

# initialize trainer
ppo_config = PPOConfig(
    batch_size=1,
)

# encode a query
query_txt = "This morning I went to the "
query_tensor = tokenizer.encode(query_txt, return_tensors="pt")

# get model response
response_tensor  = respond_to_batch(model_ref, query_tensor)

# create a ppo trainer
ppo_trainer = PPOTrainer(ppo_config, model, model_ref, tokenizer)

# define a reward for response
# (this could be any reward such as human feedback or output from another model)
reward = [torch.tensor(1.0)]

# train model for one step with ppo
train_stats = ppo_trainer.step([query_tensor[0]], [response_tensor[0]], reward)

Error:

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Traceback (most recent call last):
  File "PPO.py", line 34, in <module>
    train_stats = ppo_trainer.step([query_tensor[0]], [response_tensor[0]], reward)
  File "/<PATH_TO_FOLDER>/lib/python3.8/site-packages/trl/trainer/ppo_trainer.py", line 476, in step
    train_stats = self.train_minibatch(
  File "/<PATH_TO_FOLDER>/lib/python3.8/site-packages/trl/trainer/ppo_trainer.py", line 681, in train_minibatch
    loss_p, loss_v, train_stats = self.loss(old_logprobs, values, rewards, logits, vpreds, logprobs, mask)
  File "/<PATH_TO_FOLDER>/lib/python3.8/site-packages/trl/trainer/ppo_trainer.py", line 767, in loss
    advantages = masked_whiten(advantages, mask)
  File "/<PATH_TO_FOLDER>/lib/python3.8/site-packages/trl/core.py", line 125, in masked_whiten
    mean, var = masked_mean(values, mask), masked_var(values, mask)
  File "/<PATH_TO_FOLDER>/lib/python3.8/site-packages/trl/core.py", line 109, in masked_mean
    return (values * mask).sum(axis=axis) / mask.sum(axis=axis)
RuntimeError: Please look up dimensions by name, got: name = None.

How do I resolve it?

prakamya-mishra avatar Mar 10 '23 00:03 prakamya-mishra

Hi @prakamya-mishra Thanks for the issue Can you please update trl ? pip install --upgrade trl should be solved in https://github.com/lvwerra/trl/pull/190

younesbelkada avatar Mar 10 '23 06:03 younesbelkada

Closing this for now. Feel free to reopen if issue persists.

lvwerra avatar Mar 21 '23 10:03 lvwerra