Format conversion issue after downstream SFT
I used my own dataset to finetune the model bitnet-b1.58-2B-4T-bf16 for downstream task. The saved checkpoint directory is as follows:
path/to/my/ckpt
├── chat_template.jinja
├── config.json
├── generation_config.json
├── model.safetensors
├── optimizer.pt
├── rng_state.pth
├── scheduler.pt
├── special_tokens_map.jso
├── tokenizer.json
├── tokenizer_config.json
├── trainer_state.json
└── training_args.bin
Now I'm trying to convert this model to the gguf format, as what it is in readme file:
python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s
(replaced models/BitNet-b1.58-2B-4T with my actual model path)
And I modified the setup_env.py file to support my local model path, but I'm encountering an error when running the command, as follows:
INFO:hf-to-gguf:Loading model: checkpoint-1800
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 4096
INFO:hf-to-gguf:gguf: embedding length = 2560
INFO:hf-to-gguf:gguf: feed forward length = 6912
INFO:hf-to-gguf:gguf: head count = 20
INFO:hf-to-gguf:gguf: key-value head count = 5
INFO:hf-to-gguf:gguf: rope theta = 500000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 0
INFO:hf-to-gguf:Set model tokenizer
Traceback (most recent call last):
File "/data/personal/liuzhi/projects/LLaMA-Factory/BitNet/utils/convert-hf-to-gguf-bitnet.py", line 1165, in <module>
main()
File "/data/personal/liuzhi/projects/LLaMA-Factory/BitNet/utils/convert-hf-to-gguf-bitnet.py", line 1150, in main
model_instance.set_vocab()
File "/data/personal/liuzhi/projects/LLaMA-Factory/BitNet/utils/convert-hf-to-gguf-bitnet.py", line 957, in set_vocab
self._set_vocab_sentencepiece()
File "/data/personal/liuzhi/projects/LLaMA-Factory/BitNet/utils/convert-hf-to-gguf-bitnet.py", line 383, in _set_vocab_sentencepiece
raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: /data/personal/liuzhi/projects/LLaMA-Factory/ckpt/bitnet-b1.58-2B-4T-bf16/20250428_ele_v2/checkpoint-1800/tokenizer.model
It seems that the tokenizer.model file is missing but this file doesn't exist in the offical code and model files.
Could you please provide some guidance on how to convert my model to the gguf format and inference, like the official code did? Appreciate!
In addition, in setup_env.py file I just modifed the gen_code() method to make it do the same thing as get_model_name() == "BitNet-b1.58-2B-4T"
@LiuZhihhxx Have you solved this problem?
@LiuZhihhxx Have you solved this problem?
Not yet. It seems an essential step for downstream application.
@LiuZhihhxx Have you solved this problem?
Not yet. It seems an essential step for downstream application.
That's right, I write a new issue, waiting for official reply😵
Hey the tokenizer.model is still missing from the official repo, unable to convert tl2 without it. @BradZhone
Hey the tokenizer.model is still missing from the official repo, unable to convert tl2 without it. @BradZhone
So far the problem still exists. The tokenizer.model is missing.
@LiuZhihhxx
Hello LiuZhihhxx,
I noticed your comment here about your successful fine-tuning of the BitNet-b1.58-2B-4T-bf16 model on your dataset.
I've been trying to fine-tune the same model with a Korean dataset, but I've encountered several challenges, particularly with unstable training loss.
Would you be able to share any insights or sample code related to your successful fine-tuning setup? Your guidance would be incredibly helpful.
Thank you for your time and consideration!
I used my own dataset to finetune the model
bitnet-b1.58-2B-4T-bf16for downstream task. The saved checkpoint directory is as follows:path/to/my/ckpt
├── chat_template.jinja
├── config.json
├── generation_config.json
├── model.safetensors
├── optimizer.pt
├── rng_state.pth
├── scheduler.pt
├── special_tokens_map.jso
├── tokenizer.json
├── tokenizer_config.json
├── trainer_state.json
└── training_args.bin
Now I'm trying to convert this model to theggufformat, as what it is in readme file:python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s (replaced
models/BitNet-b1.58-2B-4Twith my actual model path)And I modified the
setup_env.pyfile to support my local model path, but I'm encountering an error when running the command, as follows:INFO:hf-to-gguf:Loading model: checkpoint-1800 INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Set model parameters INFO:hf-to-gguf:gguf: context length = 4096 INFO:hf-to-gguf:gguf: embedding length = 2560 INFO:hf-to-gguf:gguf: feed forward length = 6912 INFO:hf-to-gguf:gguf: head count = 20 INFO:hf-to-gguf:gguf: key-value head count = 5 INFO:hf-to-gguf:gguf: rope theta = 500000.0 INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05 INFO:hf-to-gguf:gguf: file type = 0 INFO:hf-to-gguf:Set model tokenizer Traceback (most recent call last): File "/data/personal/liuzhi/projects/LLaMA-Factory/BitNet/utils/convert-hf-to-gguf-bitnet.py", line 1165, in
main() File "/data/personal/liuzhi/projects/LLaMA-Factory/BitNet/utils/convert-hf-to-gguf-bitnet.py", line 1150, in main model_instance.set_vocab() File "/data/personal/liuzhi/projects/LLaMA-Factory/BitNet/utils/convert-hf-to-gguf-bitnet.py", line 957, in set_vocab self._set_vocab_sentencepiece() File "/data/personal/liuzhi/projects/LLaMA-Factory/BitNet/utils/convert-hf-to-gguf-bitnet.py", line 383, in _set_vocab_sentencepiece raise FileNotFoundError(f"File not found: {tokenizer_path}") FileNotFoundError: File not found: /data/personal/liuzhi/projects/LLaMA-Factory/ckpt/bitnet-b1.58-2B-4T-bf16/20250428_ele_v2/checkpoint-1800/tokenizer.model It seems that the tokenizer.modelfile is missing but this file doesn't exist in the offical code and model files.Could you please provide some guidance on how to convert my model to the
ggufformat and inference, like the official code did? Appreciate!
We implemented a standalone script for converting HF models to GGUF. You can find the instructions on how to use it here: https://github.com/microsoft/BitNet/?tab=readme-ov-file#convert-from-safetensors-checkpoints
Let us know if you encounter any issues or have further questions. Thanks for your patience!
Hello @junhuihe-hjh, thank you so much for your helpful comment earlier.
I'm currently trying to fine-tune the BitNet-b1.58-2B-4T model using a custom dataset, and I noticed some confusion around the expected data format.
I tested the following two formats:
-
Chat-style format with special tokens: {"text": "<|system|> You are Chaeyeon Kwak's personal assistant.<|eot_id|>\n<|user|> What's my favorite color?<|eot_id|>\n<|assistant|> Your favorite color is blue.<|eot_id|>"}
-
Simple Q&A format: {"text": "Q: What's my favorite way to spend a weekend?<|eot_id|>\nA: Your favorite way to spend a weekend is exploring new places outdoors.<|eot_id|>"}
I noticed that the model handles the first format somewhat okay, but the second format results in unstable training loss.
Also, since tokens like <|user|> or <|assistant|> don't seem to exist in BitNet's tokenizer, I'm not sure if the first format is even correct.
Could you clarify what the recommended data format is for fine-tuning BitNet models?
If possible, sharing a working example would be really helpful.
Thanks again for your support!