llama.cpp Add support for Chameleon

[x] I have read the contributing guidelines
Self-reported review complexity:
- [ ] Low
- [x] Medium
- [ ] High

This PR adds support for the Chameleon model. For now, this implementation only supports text->text inference and serves as base to implement the (more interesting) image->text, text->image and interleaved pipelines. However, such an implementation will probably require some changes to the CLI and internal architecture, so I suggest to do this in a separate PR.

Chameleon is based on the Llama-2 architecture with the following changes:

different (pre-)tokenizer
qk-norm
swin-norm

Note 1: in order to enable text->text inference, the image token logits are suppressed similar to the HF implementation. This needs to be removed when support for images is added.

Note 2: I implemented swin-norm, but I haven't tested it yet, as it is only used by Chameleon-30B.

To test it:

git clone https://huggingface.co/facebook/chameleon-7b
./convert-hf-to-gguf.py chameleon-7b
build/bin/llama-cli -m chameleon-7b/ggml-model-f16.gguf --temp 0.8 -s 1000 -n 50 -p "Language modeling is " -ngl 33

Output:

Language modeling is “the task of predicting the next word in a sequence of text, given the previous words.”

To implement a language model, we can use a neural network with a bidirectional LSTM layer and a softmax output layer.

Reference (requires transformers>=4.43.0.dev0):

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
set_seed(1000)
model = AutoModelForCausalLM.from_pretrained("facebook/chameleon-7b", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("facebook/chameleon-7b")
prompt = "Language modeling is "
inputs = tokenizer.encode(prompt, return_pt=True)
out = model.generate(inputs, max_new_tokens=40)
tokenizer.decode(out)

Reference output:

Language modeling is “the task of predicting the next word in a sequence of text given the previous words.”

In other words, it's a machine learning model that takes a sequence of text as input

Partially addresses #7995.

Jul 17 '24 14:07 nopperl

I have uploaded GGUFs to test this PR with here.

Jul 17 '24 14:07 nopperl

will this ever get added :(

Sep 26 '24 00:09 nate-lrt

I think it would still be a good addition. I've resolved all conflicts with master now, so it should be ready to merge.

Sep 26 '24 11:09 nopperl

Thank you @nopperl looks like it got merged!

Sep 28 '24 15:09 arch-btw

@nopperl any plans to tackle image->text and text->image?

Dec 05 '24 15:12 MasterScrat

@MasterScrat currently no plans, sorry for the late reply. AFAIK multimodal support would require a refactor of llama.cpp (https://github.com/ggerganov/llama.cpp/issues/8010#issuecomment-2376339571). I'd love to work on it, but don't have the time right now.

Dec 19 '24 15:12 nopperl