llama.cpp Custom fine-tuned DeepSeek coder model unable to be quantized to Fp16

Hi,

I am trying to quantize my custom fine-tuned deepseek-7b instruct model, and I am unable to to do. I followed the document:

# Convert to fp16
fp16 = f"{MODEL_NAME}/{MODEL_NAME.lower()}.fp16.bin"
!python llama.cpp/convert.py {MODEL_NAME} --outtype f16 --outfile {fp16}

but it produces this error:

/content/llama.cpp/gguf-py
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00001-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00001-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00002-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00003-of-00003.safetensors
params = Params(n_vocab=32256, n_embd=4096, n_layer=32, n_ctx=16384, n_ff=11008, n_head=32, n_head_kv=32, f_norm_eps=1e-06, n_experts=None, n_experts_used=None, rope_scaling_type=<RopeScalingType.LINEAR: 'linear'>, f_rope_freq_base=100000, f_rope_scale=4.0, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('deepseek-coder-6.7b-instruct-finetuned'))
Found vocab files: {'tokenizer.model': None, 'vocab.json': None, 'tokenizer.json': PosixPath('deepseek-coder-6.7b-instruct-finetuned/tokenizer.json')}
Loading vocab file 'deepseek-coder-6.7b-instruct-finetuned/tokenizer.json', type 'spm'
Traceback (most recent call last):
  File "/content/llama.cpp/convert.py", line 1662, in <module>
    main(sys.argv[1:])  # Exclude the first element (script name) from sys.argv
  File "/content/llama.cpp/convert.py", line 1618, in main
    vocab, special_vocab = vocab_factory.load_vocab(args.vocab_type, model_parent_path)
  File "/content/llama.cpp/convert.py", line 1422, in load_vocab
    vocab = SentencePieceVocab(
  File "/content/llama.cpp/convert.py", line 449, in __init__
    self.sentencepiece_tokenizer = SentencePieceProcessor(str(fname_tokenizer))
  File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 447, in Init
    self.Load(model_file=model_file, model_proto=model_proto)
  File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

I cannot seem to find similar errors on the github issues. Any insight to this would be greatly appreciated. One can replicate this experiment by quantizing a deepseek 7b instruct coder model.

Jan 31 '24 07:01 jackswl

Reads like a broken tokenizer file ? Given the vocab appears not have been fine tuned, maybe get the original from here: https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/tree/main ?

Feb 01 '24 16:02 cmp-nct

Thanks for your response - however, where do I find the vocab file in that huggingface? I assume you meant the vocab.json file?

Feb 02 '24 00:02 jackswl

the tokenizer and vocab files, I'm not sure which ones are used. But given the vocabulary is the same in your fine tune I'd assume they are identical. You could also doublecheck your local directory, if any of those files are broken

Feb 02 '24 00:02 cmp-nct

the files are not broken. This is an issue for other people as well. In fact, you dont have to quantize a custom deepseek model to get this error. If you just quantize the original 7b model, it will throw up this error too.

Feb 02 '24 00:02 jackswl

Same story with latest set of DeepSeek Math Models. python convert.py deepseek-math-7b-rl --pad-vocab Loading model file deepseek-math-7b-rl\pytorch_model-00001-of-000002.bin Loading model file deepseek-math-7b-rl\pytorch_model-00001-of-000002.bin Loading model file deepseek-math-7b-rl\pytorch_model-00002-of-000002.bin params = Params(n_vocab=102400, n_embd=4096, n_layer=30, n_ctx=4096, n_ff=11008, n_head=32, n_head_kv=32, n_experts=None, n_experts_used=None, f_norm_eps=1e-06, rope_scaling_type=None, f_rope_freq_base=10000, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=None, path_model=WindowsPath('deepseek-math-7b-rl')) Found vocab files: {'tokenizer.model': None, 'vocab.json': None, 'tokenizer.json': WindowsPath('deepseek-math-7b-rl/tokenizer.json')} Loading vocab file 'deepseek-math-7b-rl\tokenizer.json', type 'spm' Traceback (most recent call last): File "D:\Util\llama.cpp\convert.py", line 1478, in main() File "D:\Util\llama.cpp\convert.py", line 1446, in main vocab, special_vocab = vocab_factory.load_vocab(args.vocab_type, model_parent_path) File "D:\Util\llama.cpp\convert.py", line 1332, in load_vocab vocab = SentencePieceVocab( File "D:\Util\llama.cpp\convert.py", line 394, in init self.sentencepiece_tokenizer = SentencePieceProcessor(str(fname_tokenizer)) File "D:\Util\miniconda3\envs\llamacpp\lib\site-packages\sentencepiece_init_.py", line 447, in Init self.Load(model_file=model_file, model_proto=model_proto) File "D:\Util\miniconda3\envs\llamacpp\lib\site-packages\sentencepiece_init_.py", line 905, in Load return self.LoadFromFile(model_file) File "D:\Util\miniconda3\envs\llamacpp\lib\site-packages\sentencepiece_init_.py", line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: D:\a\sentencepiece\sentencepiece\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

python convert.py deepseek-math-7b-rl --vocab-type hfft --pad-vocab makes broken model. llama.cpp cannot load it.

python convert.py deepseek-math-7b-rl --vocab-type bpe --pad-vocab Makes loadable model, but it generates a lof of garbage and in general very strange output. Convert shows following message abou vocab generation: Vocab info: <BpeVocab with 100000 base tokens and 2 added tokens> Special vocab info: <SpecialVocab with 99757 merges, special tokens {'bos': 100000, 'eos': 100001}, add special tokens {'bos': True, 'eos': False}>

Feb 10 '24 18:02 vlsav

Any insights @jackshiwl ?

Feb 22 '24 14:02 RonanKMcGovern

Can confirm this issue.. although it converts the model using vocab-type hfft, the model will not load:

llm_load_vocab: SPM vocabulary, but newline token not found: unordered_map::at! Using special_pad_id instead.llm_load_vocab: mismatch in special tokens definition ( 2387/102400 vs 2400/102400 ).
[...]
terminate called after throwing an instance of 'std::out_of_range'
  what():  unordered_map::at

Mar 01 '24 16:03 Nold360

hi all, i am not investigating this issue anymore. I am using another model. Hope someone can fix this / look into this @cmp-nct

Mar 02 '24 03:03 jackswl

It seems there was a change recently that pins bpe to vocab.json . From the HF docs it looks like any compatible PretrainedTokenizer transformers supports could be represented by tokenizer.json

https://huggingface.co/docs/transformers/en/fast_tokenizers

3 weeks ago, b2213 convert.py output

Loading vocab file '/ai/models/tokenizer.json', type 'bpe'
Vocab info: <BpeVocab with 32000 base tokens and 22 added tokens>
Special vocab info: <SpecialVocab with 31757 merges, special tokens {'bos': 32013, 'eos': 32014, 'pad': 32014}, add special tokens {'bos': True, 'eos': False}>

current mainline convert.py output

Loading vocab file PosixPath('/ai/models/tokenizer.json'), type 'hfft'
fname_tokenizer: /ai/models
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Vocab info: <HfVocab with 32000 base tokens and 22 added tokens>
Special vocab info: <SpecialVocab with 0 merges, special tokens {'bos': 32013, 'eos': 32014, 'pad': 32014}, add special tokens {'bos': True, 'eos': False}>

result latest running main:

llm_load_vocab: SPM vocabulary, but newline token not found: unordered_map::at! Using special_pad_id instead.llm_load_voca  b: mismatch in special tokens definition ( 243/32256 vs 256/32256 ).

llm_load_print_meta: BOS token        = 32013 '<｜begin▁of▁sentence｜>'
llm_load_print_meta: EOS token        = 32014 '<｜end▁of▁sentence｜>'
llm_load_print_meta: UNK token        = 0 '!'
llm_load_print_meta: PAD token        = 32014 '<｜end▁of▁sentence｜>'


terminate called after throwing an instance of 'std::out_of_range'
  what():  unordered_map::at
Aborted (core dumped)

3 weeks ago running main:

llm_load_vocab: mismatch in special tokens definition ( 243/32256 vs 256/32256 ).

llm_load_print_meta: BOS token        = 32013 '<｜begin▁of▁sentence｜>'
llm_load_print_meta: EOS token        = 32014 '<｜end▁of▁sentence｜>'
llm_load_print_meta: PAD token        = 32014 '<｜end▁of▁sentence｜>'
llm_load_print_meta: LF token         = 126 'Ä'

we still have our mismatched but the type is bpe rather than spm it also produces text as expected, no garbage, rather than segfault

edit: I had another moment so I tried just copying tokenizer.json to vocab.json and setting vocab-type to bpe.

Loading vocab file PosixPath('/ai/models/vocab.json'), type 'bpe'
/ai/models
Vocab info: <BpeVocab with 32000 base tokens and 22 added tokens>
Special vocab info: <SpecialVocab with 31757 merges, special tokens {'bos': 32013, 'eos': 32014, 'pad': 32014}, add special tokens {'bos': True, 'eos': False}>

I confirmed both b2213 and the current main's convert.py if you do the above generate an f32 with an idental sha256 hash.

Mar 10 '24 17:03 itsdotscience

There's a PR from the deepseek team about this. Basically, their tokenizer needs to be supported in llama.cpp for this to work.

Mar 13 '24 16:03 christopherthompson81

Can confirm this issue.. although it converts the model using vocab-type hfft, the model will not load:

llm_load_vocab: SPM vocabulary, but newline token not found: unordered_map::at! Using special_pad_id instead.llm_load_vocab: mismatch in special tokens definition ( 2387/102400 vs 2400/102400 ).
[...]
terminate called after throwing an instance of 'std::out_of_range'
  what():  unordered_map::at

@Nold360 yeah, I got the same error, did you have any way to solve it ? thanks. It can not quantize with convert-hf-to-gguf.py too

Mar 15 '24 12:03 hyperbolic-c

This issue was closed because it has been inactive for 14 days since being marked as stale.

Apr 30 '24 01:04 github-actions[bot]