fairydreaming

Results 85 comments of fairydreaming

I added a branch with my T5 implementation: https://github.com/fairydreaming/llama.cpp/tree/t5 This is still a work in progress. For now I modified main.cpp to include llama_encode() call and pass computed encoder embeddings...

@ggerganov Good advice, I did that and it definitely simplified things, also added is_encoding flag in the context to avoid passing additional parameters. I still need to research how batches...

@ggerganov Do you think it's better to create a separate example for encoder-decoder models or to modify llama-cli command to include llama_encode() call like I did in my branch? In...

> It seems ok to merge into the existing `llama-cli` example - we can revisit later. > > 1. Maybe `bool llama_model_has_encoder()` seems simpler? OK > 2. Not sure about...

I decided to add T5 support in a series of smaller PR instead of one giant PR to facilitate code review and merging. The first PR is #8055, it adds...

First PR is now merged, I created another one adding the Unigram tokenizer: #8089

Second PR is merged, the third one is ready: #8141 After this it will be possible to use T5 models with `llama-cli`. ~~There are still some problems with CUDA backend,...

The third (and final) PR is now merged. TODO some day: add support for encoder-decoder models in llama-server.

> Does this code land in llama.dll, cause I use llama-cpp-python which uses llama.dll @Sadeghi85 I suppose the new code is there, but to use encoder-decoder models like T5 you...

I noticed that the arctic model doesn't use bias tensors, so I removed usage of bias tensors in the LLM_ARCH_ARCTIC-related code (they were all nulls anyway).