ggml icon indicating copy to clipboard operation
ggml copied to clipboard

Feature Request: Support Cerebras BTLM

Open andersonbcdefg opened this issue 1 year ago • 5 comments

BTLM is Cerebras's 3B model that matches the performance of many 7B models. Would be amazing to be able to quantize this because it would be so fast and good to run locally. Doesn't quite fit any of the existing architectures because it's based on CerebrasGPT but also uses ALiBi. Blog here: https://www.cerebras.net/machine-learning/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/

HuggingFace model here: https://huggingface.co/cerebras/btlm-3b-8k-base

andersonbcdefg avatar Aug 02 '23 07:08 andersonbcdefg

i am trying to give it a go. i never ported any models before, so its new for me. but so far it looks fun. i have model conversion working i think HF refpo. (mostly bashed on convert-cerebras-to-ggml.py) i have couple questions

  • i do not find anything about .SCB layers. What are those ?
  • for alibi is mpt-7 good reference?

It would be nice if someone experienced told me in high level what is next.

transformer.h.0.attn.c_attn.weight (7680, 2560) float16
transformer.h.0.attn.c_attn.bias  (7680,) float32
transformer.h.0.attn.c_attn.SCB  (7680,) float32

MODEL

BTLMLMHeadModel(
  (transformer): BTLMModel(
    (wte): Embedding(50257, 2560)
    (drop): Dropout(p=0.0, inplace=False)
    (h): ModuleList(
      (0-31): 32 x BTLMBlock(
        (ln_1): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (attn): BTLMAttention(
          (c_attn): Linear8bitLt(in_features=2560, out_features=7680, bias=True)
          (c_proj): Linear8bitLt(in_features=2560, out_features=2560, bias=True)
          (attn_dropout): Dropout(p=0.0, inplace=False)
          (resid_dropout): Dropout(p=0.0, inplace=False)
        )
        (ln_2): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (mlp): BTLMMLP(
          (c_fc): Linear8bitLt(in_features=2560, out_features=6826, bias=True)
          (c_fc2): Linear8bitLt(in_features=2560, out_features=6826, bias=True)
          (c_proj): Linear8bitLt(in_features=6826, out_features=2560, bias=True)
          (act): SwiGLUActivation()
          (dropout): Dropout(p=0.0, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
    (relative_pe): AlibiPositionEmbeddingLayer()
  )
  (lm_head): Linear(in_features=2560, out_features=50257, bias=False)
)

model loading cpp/wip impl file
https://huggingface.co/bornjre/btlm-3b-ggml/blob/main/btlm_model_wip.cpp

bornjre avatar Aug 05 '23 23:08 bornjre

Sorry for ping :smiley: @iboB @ggerganov

bornjre avatar Aug 05 '23 23:08 bornjre

I'm not familiar with "SCB" tensors - you have to check how they are used in Python and understand their purpose

ggerganov avatar Aug 06 '23 07:08 ggerganov

@bornjre, I think SCB tensors come from bitsandbytes (https://huggingface.co/blog/hf-bitsandbytes-integration, https://github.com/TimDettmers/bitsandbytes/blob/main/bitsandbytes/nn/modules.py), perhaps as a result of using load_in_8bit=True when loading the model in HF transformers? I don't think this is part of the original model

rskuzma avatar Aug 14 '23 13:08 rskuzma

The python implementation of this model can be found at https://huggingface.co/cerebras/btlm-3b-8k-base/blob/main/modeling_btlm.py .

The SCB tensors are a result of huggingface-side quantization and would be converted as per any bitsandbytes quantized model, and can be ignored.

You can see the SCB tensors are not present in the model here:

$ curl -sL https://huggingface.co/cerebras/btlm-3b-8k-base/resolve/main/pytorch_model.bin | strings | grep 'transformer.h.0.attn'
transformer.h.0.attn.c_attn.weightq
transformer.h.0.attn.c_attn.biasq&h
transformer.h.0.attn.c_proj.weightq.h
transformer.h.0.attn.c_proj.biasq6h

xloem avatar Sep 26 '23 18:09 xloem