Diner Burger

Results 4 issues of Diner Burger

The `Get Keywords` node fails as llama-cpp-agent has refactored structured output. This moves back to a pre-April 29 2024 version to ensure the API is the expected one.

Both exllamav2 and llama.cpp support [quantized KV cache](https://github.com/ggerganov/llama.cpp/discussions/5932) to allow pretty large context lengths on consumer hardware. It would be a great addition to mistral.rs; I've been very interested in...

new feature

## Describe the bug Phi-4-MM crashes when sending an image via OpenWebUI with the error: CUDA error at src/cuda/nonzero_bitwise.cu:138: invalid configuration argument ## Latest commit or version Which commit or...

bug

Let's try again on the correct branch lol This patch adds quanto KV cache quantization support for Transformers. The placement is unfortunately a little awkward, but we need to pass...