llama.cpp llama : add T5 (encoder-decoder) support

llama : add T5 (encoder-decoder) support

Open ggerganov opened this issue 4 months ago • 7 comments

Still not familiar with the details, but it seems it would be useful to support this architecture in llama.cpp. First, need to decide on the API and see what changes would be necessary

See discussion here: https://github.com/ggerganov/llama.cpp/issues/247

Feb 28 '24 11:02 ggerganov

@ggerganov Does this mean llama.cpp could support something like the new GritLM model which can handle both text representations and text generation? I tried the embedding sample with gritlm but the resulting embeddings don't look right.

Some references: https://github.com/ContextualAI/gritlm/blob/92025b16534712b31b3c4aaaf069350e222bd5f8/gritlm/gritlm.py#L93 https://huggingface.co/GritLM/GritLM-7B

Feb 28 '24 15:02 dranger003

The issue is about different architecture (encoder + decoder). GritLM looks like decoder-only Mistral fine-tune, so it should already work. If you think the results are not OK, you can open an issue with steps to reproduce

Feb 28 '24 15:02 ggerganov

I am looking forward to this. How many work would be needed to implement this？

Feb 28 '24 19:02 sorasoras

@dranger003 Probably that's because GritLM uses 2 prompt templates, one is used only for text generation and one only for embedding. Can you try embedding with the template specified by the author?

Feel free to open a dedicated issue to discuss in details.