fairydreaming

Results 85 comments of fairydreaming

@kyteinsky lol, didn't notice that all that time, sorry

@Sadeghi85 One thing that is missing in your code is a preparation of the input to `decode()` call. Check how it's done in llama-cli source code: https://github.com/ggerganov/llama.cpp/blob/b841d0740855c5af1344a81f261139a45a2b39ee/examples/main/main.cpp#L536-L552 So before calling...

I see there are serious problems with using T5, so I added a branch with a high-level example of inference with T5 model: https://github.com/fairydreaming/llama-cpp-python/tree/t5 Also there is a second branch...

@yugaljain1999 Yes, you can pass multiple prompts. I don't know how it works in llama-cpp-python high-level API, but in llama.cpp (low-level API) you do it by creating a batch containing...

@yugaljain1999 T5 models are still not supported in llama-server