transformers
transformers copied to clipboard
Add default tokenizer for gpt_neox (the same as gpt_neo)
The tokenization_auto.py was missing a mapping for gpt_neox, causing the AutoTokenizer initialization for GPT NeoX to fail at runtime:
File ..., in load_tokenizer(model_name_or_path='./gpt-neox-20b', **kwargs={'cache_dir': '.cache/'})
16 def load_tokenizer(model_name_or_path: str = None, **kwargs) -> AutoTokenizer:
---> 17 return AutoTokenizer.from_pretrained(model_name_or_path, **kwargs)
model_name_or_path = './gpt-neox-20b'
kwargs = {'cache_dir': '.cache/'}
File lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py:525, in AutoTokenizer.from_pretrained(cls=<class 'transformers.models.auto.tokenization_auto.AutoTokenizer'>, pretrained_model_name_or_path='./gpt-neox-20b', *inputs=(), **kwargs={'_from_auto': True, 'cache_dir': .cache/'})
522 tokenizer_class = tokenizer_class_from_name(tokenizer_class_candidate)
524 if tokenizer_class is None:
--> 525 raise ValueError(
526 f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported."
527 )
528 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
530 # Otherwise we have to be creative.
531 # if model is an encoder decoder, the encoder tokenizer class is used by default
ValueError: Tokenizer class GPTNeoXTokenizer does not exist or is not currently imported.
Thanks! Actually NeoX doesn't use the GPT2Tokenizer. I'll fix the current PR based on this though.