alibi icon indicating copy to clipboard operation
alibi copied to clipboard

AnchorText Language Model - `convert_tokens_to_string` not conform to its signature

Open RobertSamoilescu opened this issue 2 years ago • 1 comments

The AnchorText with LangugeModel does not work with tokenizers v0.12.0. The issue comes from the tokenizer.convert_tokens_to_string which no longer returns a str as it signature suggests, but a List[str].

To reproduce the error, install transformers==4.17.0. Test the following script with versions tokenizers==0.11.2 and tokenizers==0.12.0

from transformers import AutoTokenizer

# model_path = 'bert-base-uncased'
model_path = 'distilbert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_path)

# with tokenizers v0.12.0: ['code', 'ing']  -> incorrect as the results should be str
# with tokenizers v0.11.2: 'codeing' -> correct
print(tokenizer.convert_tokens_to_string(['code', '##ing']))  

Related issues: https://github.com/huggingface/transformers/issues/16525 https://github.com/huggingface/transformers/issues/16520

PR: https://github.com/huggingface/transformers/pull/16537

RobertSamoilescu avatar Apr 01 '22 11:04 RobertSamoilescu

0.12 has been yanked from PyPI so shouldn't be an issue for new installs for the time being.

jklaise avatar Apr 01 '22 11:04 jklaise