CodeGen
CodeGen copied to clipboard
warning information
CodeGen is a powerful model.
When I use the model as the following code:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-350M-mono")
model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-350M-mono")
text = "def hello_world():"
input_ids = tokenizer(text, return_tensors="pt").input_ids
generated_ids = model.generate(input_ids, max_length=128)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
However, here has some warning information:
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Do you know how I can fix it? Plus, what happens if I don't fix it?
Thank you very much!
added pad_token_id = 50256 such as :
pad_token_id = 50256
set_seed(42, deterministic=True)
device = torch.device('cuda:0')
... ...