bert_score is tokenizer max length correct?

is tokenizer max length correct?

Open ruiguo-bio opened this issue 10 months ago • 1 comments

If I use distilbert-base-uncased model trans_version 4.40 It will have max_length 1000000000000000019884624838656 in the utils.py line 216

DistilBertTokenizer(name_or_path='distilbert-base-uncased', vocab_size=30522, model_max_length=1000000000000000019884624838656, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}, clean_up_tokenization_spaces=True), added_tokens_decoder={ 0: AddedToken("[PAD]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 100: AddedToken("[UNK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 101: AddedToken("[CLS]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 102: AddedToken("[SEP]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 103: AddedToken("[MASK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), }

Apr 26 '24 15:04 ruiguo-bio

bert_score bert_score copied to clipboard

is tokenizer max length correct?

bert_score
bert_score copied to clipboard