indonlu
indonlu copied to clipboard
Different Vocab Size Between Tokenizer and Model's Word Embedding Layer
Expected Behavior
The length of tokenizer vocab size and the BERT's word embedding layer dimension should be the same
Actual Behavior
The length of tokenizer vocab size and the BERT's word embedding layer dimension is not the same
Steps to Reproduce the Problem
- Load the model:
model = AutoModel.from_pretrained('indobenchmark/indobert-base-p1') - Print the model:
print(model)

- Load the tokenizer:
tokenizer = AutoTokenizer.from_pretrained('indobenchmark/indobert-base-p1') - Print the length of toikenizer:
print(len(tokenizer))