CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

Exception when using some T5 model

Open Horikitasaku opened this issue 5 months ago • 1 comments

When I was trying out some other T5 models and those models used the T5Tokenizer for eg. ct2-transformers-converter --model Rostlab/prot_t5_xl_uniref50 --output_dir ./prot-t5-ct2/

There is an Exception: You're trying to run a Unigram model but you're file was trained with a different algorithm.

I noticed because in the TransformersConverter class, tokenizer_class = transformers. AutoTokenizer, However, some models must use the T5Tokenizer

My solution: change this code :

tokenizer_class = transformers.AutoTokenizer

to

if self._model_name_or_path == 'Rostlab/prot_t5_xl_uniref50':
    tokenizer_class = transformers.T5Tokenizer
else:
    tokenizer_class = transformers.AutoTokenizer

python 3.9.6 windows10

Horikitasaku avatar Jan 31 '24 02:01 Horikitasaku