FinGPT
FinGPT copied to clipboard
ValueError: Tokenizer class ChatGLMTokenizer does not exist or is not currently imported.
(FinGPT) developer@ai:~/PROJECTS/FinGPT/fingpt/FinGPT_sentiment/instruct-FinGPT$ python ./inference/batchbot_torch.py --path /opt/data/data/THUDM/chatglm2-6b --max_new_tokens 16
Traceback (most recent call last):
File "/home/developer/PROJECTS/FinGPT/fingpt/FinGPT_sentiment/instruct-FinGPT/./inference/batchbot_torch.py", line 147, in
After lots of fiddling, now it does this:
File "/home/developer/PROJECTS/FinGPT/fingpt/FinGPT-v3/benchmark/alexotest.py", line 9, in
Running your Jupyter notebook
File ~/.cache/huggingface/modules/transformers_modules/THUDM/chatglm2-6b/8fd7fba285f7171d3ae7ea3b35c53b6340501ed1/tokenization_chatglm.py:69, in ChatGLMTokenizer.init(self, vocab_file, padding_side, clean_up_tokenization_spaces, **kwargs) 68 def init(self, vocab_file, padding_side="left", clean_up_tokenization_spaces=False, **kwargs): ---> 69 super().init(padding_side=padding_side, clean_up_tokenization_spaces=clean_up_tokenization_spaces, **kwargs) 70 self.name = "GLMTokenizer" 72 self.vocab_file = vocab_file
File ~/mambaforge/envs/FinGPT/lib/python3.10/site-packages/transformers/tokenization_utils.py:366, in PreTrainedTokenizer.init(self, **kwargs)
362 self._added_tokens_encoder: Dict[str, int] = {k.content: v for v, k in self._added_tokens_decoder.items()}
364 # 4. If some of the special tokens are not part of the vocab, we add them, at the end.
365 # the order of addition is the same as self.SPECIAL_TOKENS_ATTRIBUTES following tokenizers
--> 366 self._add_tokens(self.all_special_tokens_extended, special_tokens=True)
368 self._decode_use_source_tokenizer = False
File ~/mambaforge/envs/FinGPT/lib/python3.10/site-packages/transformers/tokenization_utils.py:454, in PreTrainedTokenizer._add_tokens(self, new_tokens, special_tokens) 452 if new_tokens is None: 453 return added_tokens --> 454 current_vocab = self.get_vocab().copy() 455 new_idx = len(current_vocab) # only call this once, len gives the last index + 1 456 for token in new_tokens:
File ~/.cache/huggingface/modules/transformers_modules/THUDM/chatglm2-6b/8fd7fba285f7171d3ae7ea3b35c53b6340501ed1/tokenization_chatglm.py:112, in ChatGLMTokenizer.get_vocab(self) 110 def get_vocab(self): 111 """ Returns vocab as a dict """ --> 112 vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)} 113 vocab.update(self.added_tokens_encoder) 114 return vocab
File ~/.cache/huggingface/modules/transformers_modules/THUDM/chatglm2-6b/8fd7fba285f7171d3ae7ea3b35c53b6340501ed1/tokenization_chatglm.py:108, in ChatGLMTokenizer.vocab_size(self) 106 @property 107 def vocab_size(self): --> 108 return self.tokenizer.n_words
AttributeError: 'ChatGLMTokenizer' object has no attribute 'tokenizer'
!pip install protobuf transformers==4.30.2 cpm_kernels torch>=2.0 gradio mdtex2html sentencepiece accelerate