pythia
pythia copied to clipboard
tokenizer.pad_token
Hello team,
When I do:
from transformers import AutoTokenizer
pretrained_model = "EleutherAI/pythia-160m"
tokenizer = AutoTokenizer.from_pretrained(
pretrained_model,
padding_side="left",
cache_dir=pretrained_model+'_tokenizer',
)
print(tokenizer.pad_token)
It seems like the pad_token is empty (None is printed).
tokenizer.pad_token = tokenizer.eos_token seems fixing the issue. Is this the same way to apply padding token as in the training process?
Thank you!
Yes I think so. You can just set the pad_token = eos_token during training.