scibert
scibert copied to clipboard
max_len returns unexpected value
Hi,
I noticed something weird about the max_len
attribute of the tokenizer
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("allenai/scibert_scivocab_uncased")
print(tokenizer.max_len) # => 1000000000000000019884624838656
Whereas I expected it to be 512, as in
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
print(tokenizer.max_len) # => 512
is this a bug? Or is max_len
not the appropriate attribute to use if I want to know the max length for the inputs of the model?