pythia icon indicating copy to clipboard operation
pythia copied to clipboard

tokenizer.pad_token

Open vincent317 opened this issue 1 year ago • 1 comments

Hello team,

When I do:

from transformers import AutoTokenizer
pretrained_model = "EleutherAI/pythia-160m"
tokenizer = AutoTokenizer.from_pretrained(
  pretrained_model,
  padding_side="left",
  cache_dir=pretrained_model+'_tokenizer',
)
print(tokenizer.pad_token)

It seems like the pad_token is empty (None is printed).

tokenizer.pad_token = tokenizer.eos_token seems fixing the issue. Is this the same way to apply padding token as in the training process?

Thank you!

vincent317 avatar Apr 05 '24 02:04 vincent317

Yes I think so. You can just set the pad_token = eos_token during training.

cauchy221 avatar Apr 15 '24 20:04 cauchy221