CodeGen icon indicating copy to clipboard operation
CodeGen copied to clipboard

warning information

Open yz-qiang opened this issue 1 year ago • 1 comments

CodeGen is a powerful model.

When I use the model as the following code:

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-350M-mono")
model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-350M-mono")

text = "def hello_world():"
input_ids = tokenizer(text, return_tensors="pt").input_ids

generated_ids = model.generate(input_ids, max_length=128)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

However, here has some warning information:

The attention mask and the pad token id were not set.  As a consequence, you may observe unexpected behavior.  Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Do you know how I can fix it? Plus, what happens if I don't fix it?

Thank you very much!

yz-qiang avatar Mar 15 '23 06:03 yz-qiang

added pad_token_id = 50256 such as :

pad_token_id = 50256
set_seed(42, deterministic=True)
device = torch.device('cuda:0')
... ...

little51 avatar Mar 15 '23 15:03 little51