scibert icon indicating copy to clipboard operation
scibert copied to clipboard

max_len returns unexpected value

Open JohnGiorgi opened this issue 4 years ago • 0 comments

Hi,

I noticed something weird about the max_len attribute of the tokenizer

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("allenai/scibert_scivocab_uncased")
print(tokenizer.max_len)  # => 1000000000000000019884624838656

Whereas I expected it to be 512, as in

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
print(tokenizer.max_len)  # => 512

is this a bug? Or is max_len not the appropriate attribute to use if I want to know the max length for the inputs of the model?

JohnGiorgi avatar Jun 18 '20 19:06 JohnGiorgi