How to convert a gguf model by myself, rather using official gguf model?
As we can see, there are three newest official bitnet 2b models: model1: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T model2: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16 model3: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf The info I know is that model1 used for gpu inference, model2 used continual training and model3 used for cpu inference, is that right? When I use model3 to run the demo, it's fine, but if I change model 3 to model2 or model1, things went wrong, just like menthoned in https://github.com/microsoft/BitNet/issues/231 and https://github.com/microsoft/BitNet/issues/236. So I'm very curious about how your guys got the official gguf model? If I convert it by my-self, will the performance be the same? I have search all these issues, when asking about how to solve hf model convert to gguf model problem, the reply always be "using our official gguf model". BUT, what if I want to convert your official model2 to model3, how can I do? Any solutions?
Thanks for the question, we will prepare for the conversion script to address this issue.
Thanks for the question, we will prepare for the conversion script to address this issue.
Thanks for your reply, really looking forward to your updates : )
Thanks for the question, we will prepare for the conversion script to address this issue.
hello, any update?
The conversion script is now ready. You can find the instructions on how to use it here: https://github.com/microsoft/BitNet/?tab=readme-ov-file#convert-from-safetensors-checkpoints
Let us know if you encounter any issues or have further questions. Thanks for your patience!
The conversion script is now ready. You can find the instructions on how to use it here: https://github.com/microsoft/BitNet/?tab=readme-ov-file#convert-from-safetensors-checkpoints
Let us know if you encounter any issues or have further questions. Thanks for your patience!
Thanks. My problems had solved.
Thanks for you excellent works, the model can be convert successfully. But I found that the converted model and official gguf model have some slightly differences of the architecture while running inference, so are the bf16 model and the gguf model you supply come from the same one model?
- converted model:
- official gguf model:
The difference in metadata like general.architecture and general.name is due to different versions of the gguf library. We used a customized internal version of the library to produce the official file, and these naming changes haven't been synced to the public PyPI release yet.
These are purely metadata differences and have no impact on the model's weights or inference performance. Your converted model will work correctly.