Andrei comments

Results 177 comments of


                                            Andrei

[BUG] Server cant handle two streaming connections in same time

@ArtyomZemlyak that should be better documented or maybe not the default behaviour. Currently working on #771 which will improve this by allowing multiple requests to efficiently be processed in parallel.

`function_call` and `tool_choice` didn't have "auto" feature implemented

@MoeBuTa can you elaborate a little more, which chat format, maybe a code sample? It should be noted that because "auto" requires the LLM to choose the appropriate tool and...

ERROR: Failed building wheel for llama-cpp-python

Can you re-run with `--verbose` and paste output here.

Publish all wheels to PyPI

Hey @simonw! Big fan of your datasette project. I hear you and I would like to make the setup process a little easier and less error-prone. Currently llama.cpp supports a...

Publish all wheels to PyPI

Hey @simonw it took a while but this is finally possible through a self-hosted PEP503 repository on Github Pages (see https://github.com/abetlen/llama-cpp-python/pull/1247) You should now be able to specify ```bash pip...

Added Mirostat Mode and related Params to Llama initialization

Hey @CoffeeVampir3 sorry to take so long to look at this. As I see it your PR adds the mirostat parameters as instance parameters, however mirostat sampling is currently possible...

Add stream_options support according to OpenAI API

@tpfau thank you for starting on this, I'll review in more depth but my initial request would be that instead of `stream_include_usage` we just add `stream_options` directly to the methods...

Retrieve attention score for all input tokens per generated token

Hey @parallaxe the approach mentioned in that repo requires computation of per token attention based on outputs of specific transformer attention head layers. This isn't currently supported by the llama.cpp...

Retrieve attention score for all input tokens per generated token

Hey @parallaxe yes you're correct that that should work, right now I'm not exposing the `ggml` bindings directly in this project but that's doable (started on that in [`ggml-python`](https://github.com/abetlen/ggml-python) but...

Retrieve attention score for all input tokens per generated token

Hope this can be a good starting point! First just need to update cmake to add the `ggml_shared` library. `CMakeLists.txt` ```cmake cmake_minimum_required(VERSION 3.21) project(llama_cpp) option(LLAMA_BUILD "Build llama.cpp shared library and...