alibi
alibi copied to clipboard
AnchorText Language Model - `convert_tokens_to_string` not conform to its signature
The AnchorText
with LangugeModel
does not work with tokenizers v0.12.0. The issue comes from the tokenizer.convert_tokens_to_string
which no longer returns a str
as it signature suggests, but a List[str]
.
To reproduce the error, install transformers==4.17.0
. Test the following script with versions tokenizers==0.11.2
and tokenizers==0.12.0
from transformers import AutoTokenizer
# model_path = 'bert-base-uncased'
model_path = 'distilbert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_path)
# with tokenizers v0.12.0: ['code', 'ing'] -> incorrect as the results should be str
# with tokenizers v0.11.2: 'codeing' -> correct
print(tokenizer.convert_tokens_to_string(['code', '##ing']))
Related issues: https://github.com/huggingface/transformers/issues/16525 https://github.com/huggingface/transformers/issues/16520
PR: https://github.com/huggingface/transformers/pull/16537
0.12
has been yanked from PyPI so shouldn't be an issue for new installs for the time being.