unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

[Feature] Is it possible to support to train microsoft/bitnet-b1.58-2B-4T ?

Open hbj52152 opened this issue 1 year ago • 6 comments

What features would you like to see? Is it related to a problem or a new feature you'd like to see? Please describe.

A new and small model,

microsoft/bitnet-b1.58-2B-4T

https://huggingface.co/microsoft/bitnet-b1.58-2B-4T

Additional context i notice that there is informaiton from huggingface: microsoft/bitnet-b1.58-2B-4T-bf16: Contains the master weights in BF16 format. Use this only for training or fine-tuning purposes.

hbj52152 avatar Apr 22 '25 06:04 hbj52152

If it works in transformers then it works in Unsloth! Have you tried to see if it works?

shimmyshimmer avatar Apr 22 '25 07:04 shimmyshimmer

If it works in transformers then it works in Unsloth! Have you tried to see if it works?

not yet. I hesitate to try.

There is some information like content below in its origin huggingface, and I am not sure whether the training methods like qlora or lora in unsloth now works for it or not.

Quantization: Native 1.58-bit weights and 8-bit activations (W1.58A8).
Weights are quantized to ternary values {-1, 0, +1} using absmean quantization during the forward pass.
Activations are quantized to 8-bit integers using absmax quantization (per-token).
Crucially, the model was trained from scratch with this quantization scheme, not post-training quantized.

Anyway, i think your opinion is right. if no news, i might have a try during weekends.

hbj52152 avatar Apr 23 '25 12:04 hbj52152

If it works in transformers then it works in Unsloth! Have you tried to see if it works?

not yet. I hesitate to try.

There is some information like content below in its origin huggingface, and I am not sure whether the training methods like qlora or lora in unsloth now works for it or not.

Quantization: Native 1.58-bit weights and 8-bit activations (W1.58A8).
Weights are quantized to ternary values {-1, 0, +1} using absmean quantization during the forward pass.
Activations are quantized to 8-bit integers using absmax quantization (per-token).
Crucially, the model was trained from scratch with this quantization scheme, not post-training quantized.

Anyway, i think your opinion is right. if no news, i might have a try during weekends.

All you need to do is change a model name in our notebook, and then change it to the model name and then run all to see if it works. Also I'm not sure if you need a HF token for that

shimmyshimmer avatar Apr 26 '25 10:04 shimmyshimmer

Hi, but how does the prompt session should look like? Thanks

Shahin-rmz avatar May 10 '25 11:05 Shahin-rmz

Is this issue still important to you? Apologies in advance we might have missed this issue as well. For faster response times, please post on our Reddit server - https://www.reddit.com/r/unsloth or our Discord - https://discord.com/invite/unsloth

github-actions[bot] avatar Jul 01 '25 05:07 github-actions[bot]

Hi @hbj52152. Have you tried that already? I am curious about others llms like Qwen3 in bitnet architecture if it is going to work.

jvkobb avatar Nov 03 '25 18:11 jvkobb