ggml
ggml copied to clipboard
Feature Request: Support Cerebras BTLM
BTLM is Cerebras's 3B model that matches the performance of many 7B models. Would be amazing to be able to quantize this because it would be so fast and good to run locally. Doesn't quite fit any of the existing architectures because it's based on CerebrasGPT but also uses ALiBi. Blog here: https://www.cerebras.net/machine-learning/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/
HuggingFace model here: https://huggingface.co/cerebras/btlm-3b-8k-base
i am trying to give it a go. i never ported any models before, so its new for me. but so far it looks fun. i have model conversion working i think HF refpo. (mostly bashed on convert-cerebras-to-ggml.py
)
i have couple questions
- i do not find anything about .SCB layers. What are those ?
- for alibi is mpt-7 good reference?
It would be nice if someone experienced told me in high level what is next.
transformer.h.0.attn.c_attn.weight (7680, 2560) float16
transformer.h.0.attn.c_attn.bias (7680,) float32
transformer.h.0.attn.c_attn.SCB (7680,) float32
MODEL
BTLMLMHeadModel(
(transformer): BTLMModel(
(wte): Embedding(50257, 2560)
(drop): Dropout(p=0.0, inplace=False)
(h): ModuleList(
(0-31): 32 x BTLMBlock(
(ln_1): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
(attn): BTLMAttention(
(c_attn): Linear8bitLt(in_features=2560, out_features=7680, bias=True)
(c_proj): Linear8bitLt(in_features=2560, out_features=2560, bias=True)
(attn_dropout): Dropout(p=0.0, inplace=False)
(resid_dropout): Dropout(p=0.0, inplace=False)
)
(ln_2): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
(mlp): BTLMMLP(
(c_fc): Linear8bitLt(in_features=2560, out_features=6826, bias=True)
(c_fc2): Linear8bitLt(in_features=2560, out_features=6826, bias=True)
(c_proj): Linear8bitLt(in_features=6826, out_features=2560, bias=True)
(act): SwiGLUActivation()
(dropout): Dropout(p=0.0, inplace=False)
)
)
)
(ln_f): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
(relative_pe): AlibiPositionEmbeddingLayer()
)
(lm_head): Linear(in_features=2560, out_features=50257, bias=False)
)
model loading cpp/wip impl file
https://huggingface.co/bornjre/btlm-3b-ggml/blob/main/btlm_model_wip.cpp
Sorry for ping :smiley: @iboB @ggerganov
I'm not familiar with "SCB" tensors - you have to check how they are used in Python and understand their purpose
@bornjre, I think SCB tensors come from bitsandbytes
(https://huggingface.co/blog/hf-bitsandbytes-integration, https://github.com/TimDettmers/bitsandbytes/blob/main/bitsandbytes/nn/modules.py), perhaps as a result of using load_in_8bit=True
when loading the model in HF transformers? I don't think this is part of the original model
The python implementation of this model can be found at https://huggingface.co/cerebras/btlm-3b-8k-base/blob/main/modeling_btlm.py .
The SCB tensors are a result of huggingface-side quantization and would be converted as per any bitsandbytes quantized model, and can be ignored.
You can see the SCB tensors are not present in the model here:
$ curl -sL https://huggingface.co/cerebras/btlm-3b-8k-base/resolve/main/pytorch_model.bin | strings | grep 'transformer.h.0.attn'
transformer.h.0.attn.c_attn.weightq
transformer.h.0.attn.c_attn.biasq&h
transformer.h.0.attn.c_proj.weightq.h
transformer.h.0.attn.c_proj.biasq6h