llama.cpp
llama.cpp copied to clipboard
Custom fine-tuned DeepSeek coder model unable to be quantized to Fp16
Hi,
I am trying to quantize my custom fine-tuned deepseek-7b instruct model, and I am unable to to do. I followed the document:
# Convert to fp16
fp16 = f"{MODEL_NAME}/{MODEL_NAME.lower()}.fp16.bin"
!python llama.cpp/convert.py {MODEL_NAME} --outtype f16 --outfile {fp16}
but it produces this error:
/content/llama.cpp/gguf-py
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00001-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00001-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00002-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00003-of-00003.safetensors
params = Params(n_vocab=32256, n_embd=4096, n_layer=32, n_ctx=16384, n_ff=11008, n_head=32, n_head_kv=32, f_norm_eps=1e-06, n_experts=None, n_experts_used=None, rope_scaling_type=<RopeScalingType.LINEAR: 'linear'>, f_rope_freq_base=100000, f_rope_scale=4.0, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('deepseek-coder-6.7b-instruct-finetuned'))
Found vocab files: {'tokenizer.model': None, 'vocab.json': None, 'tokenizer.json': PosixPath('deepseek-coder-6.7b-instruct-finetuned/tokenizer.json')}
Loading vocab file 'deepseek-coder-6.7b-instruct-finetuned/tokenizer.json', type 'spm'
Traceback (most recent call last):
File "/content/llama.cpp/convert.py", line 1662, in <module>
main(sys.argv[1:]) # Exclude the first element (script name) from sys.argv
File "/content/llama.cpp/convert.py", line 1618, in main
vocab, special_vocab = vocab_factory.load_vocab(args.vocab_type, model_parent_path)
File "/content/llama.cpp/convert.py", line 1422, in load_vocab
vocab = SentencePieceVocab(
File "/content/llama.cpp/convert.py", line 449, in __init__
self.sentencepiece_tokenizer = SentencePieceProcessor(str(fname_tokenizer))
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 447, in Init
self.Load(model_file=model_file, model_proto=model_proto)
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 905, in Load
return self.LoadFromFile(model_file)
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]
I cannot seem to find similar errors on the github issues. Any insight to this would be greatly appreciated. One can replicate this experiment by quantizing a deepseek 7b instruct coder model.
Reads like a broken tokenizer file ? Given the vocab appears not have been fine tuned, maybe get the original from here: https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/tree/main ?
Thanks for your response - however, where do I find the vocab file in that huggingface? I assume you meant the vocab.json file?
the tokenizer and vocab files, I'm not sure which ones are used. But given the vocabulary is the same in your fine tune I'd assume they are identical. You could also doublecheck your local directory, if any of those files are broken
the files are not broken. This is an issue for other people as well. In fact, you dont have to quantize a custom deepseek model to get this error. If you just quantize the original 7b model, it will throw up this error too.
Same story with latest set of DeepSeek Math Models.
python convert.py deepseek-math-7b-rl --pad-vocab
Loading model file deepseek-math-7b-rl\pytorch_model-00001-of-000002.bin
Loading model file deepseek-math-7b-rl\pytorch_model-00001-of-000002.bin
Loading model file deepseek-math-7b-rl\pytorch_model-00002-of-000002.bin
params = Params(n_vocab=102400, n_embd=4096, n_layer=30, n_ctx=4096, n_ff=11008, n_head=32, n_head_kv=32, n_experts=None, n_experts_used=None, f_norm_eps=1e-06, rope_scaling_type=None, f_rope_freq_base=10000, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=None, path_model=WindowsPath('deepseek-math-7b-rl'))
Found vocab files: {'tokenizer.model': None, 'vocab.json': None, 'tokenizer.json': WindowsPath('deepseek-math-7b-rl/tokenizer.json')}
Loading vocab file 'deepseek-math-7b-rl\tokenizer.json', type 'spm'
Traceback (most recent call last):
File "D:\Util\llama.cpp\convert.py", line 1478, in
python convert.py deepseek-math-7b-rl --vocab-type hfft --pad-vocab makes broken model. llama.cpp cannot load it.
python convert.py deepseek-math-7b-rl --vocab-type bpe --pad-vocab Makes loadable model, but it generates a lof of garbage and in general very strange output. Convert shows following message abou vocab generation: Vocab info: <BpeVocab with 100000 base tokens and 2 added tokens> Special vocab info: <SpecialVocab with 99757 merges, special tokens {'bos': 100000, 'eos': 100001}, add special tokens {'bos': True, 'eos': False}>
Any insights @jackshiwl ?
Can confirm this issue.. although it converts the model using vocab-type hfft, the model will not load:
llm_load_vocab: SPM vocabulary, but newline token not found: unordered_map::at! Using special_pad_id instead.llm_load_vocab: mismatch in special tokens definition ( 2387/102400 vs 2400/102400 ).
[...]
terminate called after throwing an instance of 'std::out_of_range'
what(): unordered_map::at
hi all, i am not investigating this issue anymore. I am using another model. Hope someone can fix this / look into this @cmp-nct
It seems there was a change recently that pins bpe to vocab.json . From the HF docs it looks like any compatible PretrainedTokenizer transformers supports could be represented by tokenizer.json
https://huggingface.co/docs/transformers/en/fast_tokenizers
3 weeks ago, b2213 convert.py output
Loading vocab file '/ai/models/tokenizer.json', type 'bpe'
Vocab info: <BpeVocab with 32000 base tokens and 22 added tokens>
Special vocab info: <SpecialVocab with 31757 merges, special tokens {'bos': 32013, 'eos': 32014, 'pad': 32014}, add special tokens {'bos': True, 'eos': False}>
current mainline convert.py output
Loading vocab file PosixPath('/ai/models/tokenizer.json'), type 'hfft'
fname_tokenizer: /ai/models
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Vocab info: <HfVocab with 32000 base tokens and 22 added tokens>
Special vocab info: <SpecialVocab with 0 merges, special tokens {'bos': 32013, 'eos': 32014, 'pad': 32014}, add special tokens {'bos': True, 'eos': False}>
result latest running main:
llm_load_vocab: SPM vocabulary, but newline token not found: unordered_map::at! Using special_pad_id instead.llm_load_voca b: mismatch in special tokens definition ( 243/32256 vs 256/32256 ).
llm_load_print_meta: BOS token = 32013 '<|begin▁of▁sentence|>'
llm_load_print_meta: EOS token = 32014 '<|end▁of▁sentence|>'
llm_load_print_meta: UNK token = 0 '!'
llm_load_print_meta: PAD token = 32014 '<|end▁of▁sentence|>'
terminate called after throwing an instance of 'std::out_of_range'
what(): unordered_map::at
Aborted (core dumped)
3 weeks ago running main:
llm_load_vocab: mismatch in special tokens definition ( 243/32256 vs 256/32256 ).
llm_load_print_meta: BOS token = 32013 '<|begin▁of▁sentence|>'
llm_load_print_meta: EOS token = 32014 '<|end▁of▁sentence|>'
llm_load_print_meta: PAD token = 32014 '<|end▁of▁sentence|>'
llm_load_print_meta: LF token = 126 'Ä'
we still have our mismatched but the type is bpe rather than spm it also produces text as expected, no garbage, rather than segfault
edit: I had another moment so I tried just copying tokenizer.json to vocab.json and setting vocab-type to bpe.
Loading vocab file PosixPath('/ai/models/vocab.json'), type 'bpe'
/ai/models
Vocab info: <BpeVocab with 32000 base tokens and 22 added tokens>
Special vocab info: <SpecialVocab with 31757 merges, special tokens {'bos': 32013, 'eos': 32014, 'pad': 32014}, add special tokens {'bos': True, 'eos': False}>
I confirmed both b2213 and the current main's convert.py if you do the above generate an f32 with an idental sha256 hash.
There's a PR from the deepseek team about this. Basically, their tokenizer needs to be supported in llama.cpp for this to work.
Can confirm this issue.. although it converts the model using
vocab-type hfft, the model will not load:llm_load_vocab: SPM vocabulary, but newline token not found: unordered_map::at! Using special_pad_id instead.llm_load_vocab: mismatch in special tokens definition ( 2387/102400 vs 2400/102400 ). [...] terminate called after throwing an instance of 'std::out_of_range' what(): unordered_map::at
@Nold360 yeah, I got the same error, did you have any way to solve it ? thanks. It can not quantize with convert-hf-to-gguf.py too
This issue was closed because it has been inactive for 14 days since being marked as stale.