CTranslate2
CTranslate2 copied to clipboard
Exception when using some T5 model
When I was trying out some other T5 models and those models used the T5Tokenizer
for eg. ct2-transformers-converter --model Rostlab/prot_t5_xl_uniref50 --output_dir ./prot-t5-ct2/
There is an Exception: You're trying to run a Unigram
model but you're file was trained with a different algorithm.
I noticed because in the TransformersConverter
class, tokenizer_class = transformers. AutoTokenizer
,
However, some models must use the T5Tokenizer
My solution: change this code :
tokenizer_class = transformers.AutoTokenizer
to
if self._model_name_or_path == 'Rostlab/prot_t5_xl_uniref50':
tokenizer_class = transformers.T5Tokenizer
else:
tokenizer_class = transformers.AutoTokenizer
python 3.9.6 windows10