exllama icon indicating copy to clipboard operation
exllama copied to clipboard

piece id is out of range

Open chethanwiz opened this issue 1 year ago • 3 comments

can someone help me with this error please

Traceback (most recent call last): File "C:\Users\cheth\Music\new chaya\OneReality\OneRealityMemory.py", line 68, in ExLlamatokenizer = ExLlamaV2Tokenizer(config) File "C:\Python310\lib\site-packages\exllamav2\tokenizer\tokenizer.py", line 192, in _init self.eos_token = (self.tokenizer_model.eos_token() or self.extended_id_to_piece.get(self.eos_token_id, None)) or self.tokenizer_model.id_to_piece(self.eos_token_id) File "C:\Python310\lib\site-packages\exllamav2\tokenizer\spm.py", line 43, in id_to_piece return self.spm.id_to_piece(idx) File "C:\Python310\lib\site-packages\sentencepiece_init.py", line 1179, in _batched_func return func(self, arg) File "C:\Python310\lib\site-packages\sentencepiece_init.py", line 1172, in _func raise IndexError('piece id is out of range.') IndexError: piece id is out of range.

chethanwiz avatar Apr 09 '24 15:04 chethanwiz

This is usually caused by conflicting vocabularies in merged models. Would help to know what model this is.

turboderp avatar Apr 09 '24 15:04 turboderp

dolphin-2.1-mistral-7B-GPTQ

chethanwiz avatar Apr 09 '24 19:04 chethanwiz

The model seems to be using the same tokenizer as Mistral, which doesn't define the two ChatML tokens that Dolphin needs. You can try adding an added_tokens.json file to the model directory with this content:

{
  "<|im_end|>": 32000,
  "<|im_start|>": 32001
}

turboderp avatar Apr 09 '24 19:04 turboderp