[Feature] Is it possible to support to train microsoft/bitnet-b1.58-2B-4T ?
What features would you like to see? Is it related to a problem or a new feature you'd like to see? Please describe.
A new and small model,
microsoft/bitnet-b1.58-2B-4T
https://huggingface.co/microsoft/bitnet-b1.58-2B-4T
Additional context i notice that there is informaiton from huggingface: microsoft/bitnet-b1.58-2B-4T-bf16: Contains the master weights in BF16 format. Use this only for training or fine-tuning purposes.
If it works in transformers then it works in Unsloth! Have you tried to see if it works?
If it works in transformers then it works in Unsloth! Have you tried to see if it works?
not yet. I hesitate to try.
There is some information like content below in its origin huggingface, and I am not sure whether the training methods like qlora or lora in unsloth now works for it or not.
Quantization: Native 1.58-bit weights and 8-bit activations (W1.58A8).
Weights are quantized to ternary values {-1, 0, +1} using absmean quantization during the forward pass.
Activations are quantized to 8-bit integers using absmax quantization (per-token).
Crucially, the model was trained from scratch with this quantization scheme, not post-training quantized.
Anyway, i think your opinion is right. if no news, i might have a try during weekends.
If it works in transformers then it works in Unsloth! Have you tried to see if it works?
not yet. I hesitate to try.
There is some information like content below in its origin huggingface, and I am not sure whether the training methods like qlora or lora in unsloth now works for it or not.
Quantization: Native 1.58-bit weights and 8-bit activations (W1.58A8). Weights are quantized to ternary values {-1, 0, +1} using absmean quantization during the forward pass. Activations are quantized to 8-bit integers using absmax quantization (per-token). Crucially, the model was trained from scratch with this quantization scheme, not post-training quantized.Anyway, i think your opinion is right. if no news, i might have a try during weekends.
All you need to do is change a model name in our notebook, and then change it to the model name and then run all to see if it works. Also I'm not sure if you need a HF token for that
Hi, but how does the prompt session should look like? Thanks
Is this issue still important to you? Apologies in advance we might have missed this issue as well. For faster response times, please post on our Reddit server - https://www.reddit.com/r/unsloth or our Discord - https://discord.com/invite/unsloth
Hi @hbj52152. Have you tried that already? I am curious about others llms like Qwen3 in bitnet architecture if it is going to work.