indonlu

indonlu copied to clipboard

Reame
Issues

Different Vocab Size Between Tokenizer and Model's Word Embedding Layer

Open louisowen6 opened this issue 4 years ago • 0 comments

Expected Behavior

The length of tokenizer vocab size and the BERT's word embedding layer dimension should be the same

Actual Behavior

The length of tokenizer vocab size and the BERT's word embedding layer dimension is not the same

Steps to Reproduce the Problem

Load the model: model = AutoModel.from_pretrained('indobenchmark/indobert-base-p1')
Print the model: print(model)

Load the tokenizer: tokenizer = AutoTokenizer.from_pretrained('indobenchmark/indobert-base-p1')
Print the length of toikenizer: print(len(tokenizer))

Jul 23 '21 09:07 louisowen6