fairydreaming
fairydreaming
I added a branch with my T5 implementation: https://github.com/fairydreaming/llama.cpp/tree/t5 This is still a work in progress. For now I modified main.cpp to include llama_encode() call and pass computed encoder embeddings...
@ggerganov Good advice, I did that and it definitely simplified things, also added is_encoding flag in the context to avoid passing additional parameters. I still need to research how batches...
@ggerganov Do you think it's better to create a separate example for encoder-decoder models or to modify llama-cli command to include llama_encode() call like I did in my branch? In...
> It seems ok to merge into the existing `llama-cli` example - we can revisit later. > > 1. Maybe `bool llama_model_has_encoder()` seems simpler? OK > 2. Not sure about...
I decided to add T5 support in a series of smaller PR instead of one giant PR to facilitate code review and merging. The first PR is #8055, it adds...
First PR is now merged, I created another one adding the Unigram tokenizer: #8089
Second PR is merged, the third one is ready: #8141 After this it will be possible to use T5 models with `llama-cli`. ~~There are still some problems with CUDA backend,...
The third (and final) PR is now merged. TODO some day: add support for encoder-decoder models in llama-server.
> Does this code land in llama.dll, cause I use llama-cpp-python which uses llama.dll @Sadeghi85 I suppose the new code is there, but to use encoder-decoder models like T5 you...
I noticed that the arctic model doesn't use bias tensors, so I removed usage of bias tensors in the LLM_ARCH_ARCTIC-related code (they were all nulls anyway).