Andrei
Andrei
@ArtyomZemlyak that should be better documented or maybe not the default behaviour. Currently working on #771 which will improve this by allowing multiple requests to efficiently be processed in parallel.
@MoeBuTa can you elaborate a little more, which chat format, maybe a code sample? It should be noted that because "auto" requires the LLM to choose the appropriate tool and...
Can you re-run with `--verbose` and paste output here.
Hey @simonw! Big fan of your datasette project. I hear you and I would like to make the setup process a little easier and less error-prone. Currently llama.cpp supports a...
Hey @simonw it took a while but this is finally possible through a self-hosted PEP503 repository on Github Pages (see https://github.com/abetlen/llama-cpp-python/pull/1247) You should now be able to specify ```bash pip...
Hey @CoffeeVampir3 sorry to take so long to look at this. As I see it your PR adds the mirostat parameters as instance parameters, however mirostat sampling is currently possible...
@tpfau thank you for starting on this, I'll review in more depth but my initial request would be that instead of `stream_include_usage` we just add `stream_options` directly to the methods...
Hey @parallaxe the approach mentioned in that repo requires computation of per token attention based on outputs of specific transformer attention head layers. This isn't currently supported by the llama.cpp...
Hey @parallaxe yes you're correct that that should work, right now I'm not exposing the `ggml` bindings directly in this project but that's doable (started on that in [`ggml-python`](https://github.com/abetlen/ggml-python) but...
Hope this can be a good starting point! First just need to update cmake to add the `ggml_shared` library. `CMakeLists.txt` ```cmake cmake_minimum_required(VERSION 3.21) project(llama_cpp) option(LLAMA_BUILD "Build llama.cpp shared library and...