BioGPT icon indicating copy to clipboard operation
BioGPT copied to clipboard

Tokenizer encodes special tokens as two element list

Open nicolemcnally opened this issue 2 years ago • 0 comments

Using the huggingface implementation of biogpt tokenizer i expected only 1 element but got 2.

from transformers import BioGptTokenizer tokenizer= BioGptTokenizer.from_pretrained('microsoft/biogpt')
tokenizer.encode(tokenizer.eos_token) output: [2, 2]

nicolemcnally avatar Feb 16 '23 12:02 nicolemcnally