Andrei

Results 177 comments of Andrei

@ArtyomZemlyak that should be better documented or maybe not the default behaviour. Currently working on #771 which will improve this by allowing multiple requests to efficiently be processed in parallel.

@MoeBuTa can you elaborate a little more, which chat format, maybe a code sample? It should be noted that because "auto" requires the LLM to choose the appropriate tool and...

Can you re-run with `--verbose` and paste output here.

Hey @simonw! Big fan of your datasette project. I hear you and I would like to make the setup process a little easier and less error-prone. Currently llama.cpp supports a...

Hey @simonw it took a while but this is finally possible through a self-hosted PEP503 repository on Github Pages (see https://github.com/abetlen/llama-cpp-python/pull/1247) You should now be able to specify ```bash pip...

Hey @CoffeeVampir3 sorry to take so long to look at this. As I see it your PR adds the mirostat parameters as instance parameters, however mirostat sampling is currently possible...

@tpfau thank you for starting on this, I'll review in more depth but my initial request would be that instead of `stream_include_usage` we just add `stream_options` directly to the methods...

Hey @parallaxe the approach mentioned in that repo requires computation of per token attention based on outputs of specific transformer attention head layers. This isn't currently supported by the llama.cpp...

Hey @parallaxe yes you're correct that that should work, right now I'm not exposing the `ggml` bindings directly in this project but that's doable (started on that in [`ggml-python`](https://github.com/abetlen/ggml-python) but...

Hope this can be a good starting point! First just need to update cmake to add the `ggml_shared` library. `CMakeLists.txt` ```cmake cmake_minimum_required(VERSION 3.21) project(llama_cpp) option(LLAMA_BUILD "Build llama.cpp shared library and...