Andrei

Results 177 comments of Andrei

That's the approach I was initially trying but it caused [this assert](https://github.com/ggerganov/llama.cpp/blob/73bac2b11d7d3e20982fc9ee607625836387db8b/llama.cpp#L12293) to fail as the logits aren't reserved when `cparams.causal_attn` is false. However I think I was just missing...

@ggerganov was able to come back to this and finally get it working. Changes: - Added a `llama_token_inp_embd` function to the `llama.h` API which translates a set of input tokens...

@ggerganov no problem, I'll work with @ngxson and see if I can provide support on that PR.

Hey @ggerganov I missed this earlier. Thank you, yeah I just need some quick clarifications around the kv cache behaviour. The following is my understanding of the `kv_cache` implementation -...

Hi @agunapal sorry to get to this so late, are you setting the `n_gpu_layers` parameter? This is required to offload layers to the gpu and is off by default.

@agunapal thanks for providing that. It looks like the issue might actually be with llama.cpp / your version of metal as it's only happening when the metal kernel file is...

@agunapal yeah that's very strange, can you post the top part of the `./main` command where it's setting up? You can also build `llama.cpp` as a shared library with `cmake...

@agunapal try setting n_gpu_layers to 1 now.

Hey @BlackLotus you're exactly right, the choices key actually comes from the [OpenAI API](https://platform.openai.com/docs/api-reference/completions/object) but it's unnused in this library at the moment. I'm currently working on the multi-completion feature...

@Smartappli I was looking at this a few months ago as well because uv is a pretty amazing tool. The issue I ran into however is that it doesn't log...