Akarshan Biswas comments

Results 101 comments of


                                            Akarshan Biswas

bug: Request timeout/cancellation with slow inference models (1-4 t/s) [Critical]

Hopefully, My prompt progress implementation will solve it, especially with a low batch size to run on weak CPUs. If it doesn't, this issue can be revisited. I am pushing...

bug: Request timeout/cancellation with slow inference models (1-4 t/s) [Critical]

I think this issue has been resolved as we have configurable timeout settings in llama.cpp. Please let us know if it still doesn't work. For now, closing it.

feat: Add reasoning effort configuration for reasoning models

Need UI design, done from the backend side.

Eval bug: Qwen3 Q4_0 not working with SYCL

> I believe the SYCL Q4_0 reorder optimizations resulted in this as setting GGML_SYCL_DISABLE_OPT=1 allowed things to run normally again cc @Rbiessy @NeoZhangJianyu @Alcpz ^

Investigate gemma 2 generation quality

Just to confirm, gemma2 's window size is hard coded right?

Investigate gemma 2 generation quality

9B-IT is working great and now I can increase the ctx size. :)

Investigate gemma 2 generation quality

Just to mention here, when I was converting the HF gemma2 to bft16 gguf, I noticed that the norm tensors were converted to fp16 instead of directly copying them from...

bug: Failed to load model on RX 6900XT

This sounds an issue when calling the set_tensor function inside ggml-vulkan.cpp. However the message is very cryptic and doesn't provide much information. My suggestion is to run with set VK_LOG_DEBUG=1...

idea: Render CoT/reasoning for remote/custom providers

For nonstreaming data, The reasoning is in choices[0].message.reasoning_content in deepseek format. For streaming data, it's in choices[0].delta.reasoning_content. This structure depends on model's chat template. Not all reasoning models do this...

feat: Custom model location for Llama.cpp engine

Currently, import() in the new llamacpp extension does similar, the model file stays in its original location. We can introduce recursive 'model folder' import() to achieve similar to what has...