Error making gguf: KeyError: '<|user|>'
System Info / 系統信息
transformers: 4.44.0 llama.cpp: latest
Hi, when I try to make a gguf I get this error:
Traceback (most recent call last): File "/home/david/llm/llama.cpp/convert_hf_to_gguf.py", line 4074, in
main() File "/home/david/llm/llama.cpp/convert_hf_to_gguf.py", line 4068, in main model_instance.write() File "/home/david/llm/llama.cpp/convert_hf_to_gguf.py", line 388, in write self.prepare_metadata(vocab_only=False) File "/home/david/llm/llama.cpp/convert_hf_to_gguf.py", line 381, in prepare_metadata self.set_vocab() File "/home/david/llm/llama.cpp/convert_hf_to_gguf.py", line 3713, in set_vocab special_vocab._set_special_token("eot", tokenizer.get_added_vocab()["<|user|>"]) ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^ KeyError: '<|user|>'
Do you know how to fix this?
On huggingface someone else has the same problem:
https://huggingface.co/THUDM/LongWriter-glm4-9b/discussions/1#66bc33eccd16fda66e7caa1f
But I don't know how to apply this solution:
Hi! You can get the token id by tokenizer.get_command("<|user|>").
Is the EOT even needed?
Thank you!
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
- [X] The official example scripts / 官方的示例脚本
- [X] My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程
With llama.cpp:
python convert_hf_to_gguf.py /home/david/llm/LongWriter-glm4-9b --outtype f32
Here is the code:
special_vocab = gguf.SpecialVocab(dir_model, load_merges=False)
special_vocab.merges = merges
# only add special tokens when they were not already loaded from config.json
special_vocab._set_special_token("eos", tokenizer.get_added_vocab()["<|endoftext|>"])
special_vocab._set_special_token("eot", tokenizer.get_added_vocab()["<|user|>"])
# this one is usually not in config.json anyway
special_vocab._set_special_token("unk", tokenizer.get_added_vocab()["<|endoftext|>"])
special_vocab.add_to_gguf(self.gguf_writer)
Expected behavior / 期待表现
For it to make a quantization.
Hi! You can get the token id by tokenizer.get_command("<|user|>").
Hi! You can get the token id by
tokenizer.get_command("<|user|>").
Hi, How to fix it ? thanks!
Have you updated to our most recent model files? Also, please use transformers>=4.43.0.
@bys0318 thank you, it appears that the token id is:
151336
Is this correct?
in llama.cpp:
llm_load_print_meta: general.name = LongWriter Glm4 9b
llm_load_print_meta: EOS token = 151329 '<|endoftext|>'
llm_load_print_meta: UNK token = 151329 '<|endoftext|>'
llm_load_print_meta: PAD token = 151329 '<|endoftext|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOT token = 151336 '[PAD151336]'
llm_load_print_meta: max token length = 1024
@echnio
I did this (find the lines starting at 3711) and replace:
# only add special tokens when they were not already loaded from config.json
special_vocab._set_special_token("eos", tokenizer.get_added_vocab()["<|endoftext|>"])
token_id = tokenizer.get_command("<|user|>")
print(token_id)
special_vocab._set_special_token("eot", token_id)
# this one is usually not in config.json anyway
This is correct. Thanks for sharing!
@bys0318谢谢,看来令牌 ID 是:
151336
这是正确的吗?
在 llama.cpp 中:
llm_load_print_meta: general.name = LongWriter Glm4 9b llm_load_print_meta: EOS token = 151329 '<|endoftext|>' llm_load_print_meta: UNK token = 151329 '<|endoftext|>' llm_load_print_meta: PAD token = 151329 '<|endoftext|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 151336 '[PAD151336]' llm_load_print_meta: max token length = 1024@echnio
我这样做了(找到从 3711 开始的行)并替换:
# only add special tokens when they were not already loaded from config.json special_vocab._set_special_token("eos", tokenizer.get_added_vocab()["<|endoftext|>"]) token_id = tokenizer.get_command("<|user|>") print(token_id) special_vocab._set_special_token("eot", token_id) # this one is usually not in config.json anyway
Thank you very much, the format conversion was successful.
@bys0318 thank you, it appears that the token id is:
151336
Is this correct?
in llama.cpp:
llm_load_print_meta: general.name = LongWriter Glm4 9b llm_load_print_meta: EOS token = 151329 '<|endoftext|>' llm_load_print_meta: UNK token = 151329 '<|endoftext|>' llm_load_print_meta: PAD token = 151329 '<|endoftext|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 151336 '[PAD151336]' llm_load_print_meta: max token length = 1024@echnio
I did this (find the lines starting at 3711) and replace:
# only add special tokens when they were not already loaded from config.json special_vocab._set_special_token("eos", tokenizer.get_added_vocab()["<|endoftext|>"]) token_id = tokenizer.get_command("<|user|>") print(token_id) special_vocab._set_special_token("eot", token_id) # this one is usually not in config.json anyway
Thanks for the detailed steps! I was able to convert the model. PLease find quants at QuantFactory/LongWriter-glm4-9b-GGUF