Piotr Wilkin (ilintar) comments

Results 77 comments of


                                            Piotr Wilkin (ilintar)

STT endpoints support

I'm sorry, been really busy lately but I promise I'll take a look soon!

Feature Request: Support for Microsoft's Phi-4-mini-flash-reasoning and Nvidia's Nemotron-nano-9b-v2

Nemotron is already supported.

Feature Request: Support for Microsoft's Phi-4-mini-flash-reasoning and Nvidia's Nemotron-nano-9b-v2

> > Nemotron is already supported. > > I am trying to find GGUF for this but was not successful. Can you help with this please? Thanks! https://huggingface.co/bartowski/nvidia_NVIDIA-Nemotron-Nano-12B-v2-GGUF https://huggingface.co/bartowski/nvidia_NVIDIA-Nemotron-Nano-9B-v2-GGUF

graph : reuse SSM graphs

@gabe-l-hart as a side note, I've added `cumsum` and `tri` as new ops during the Qwen3Next implementation, so that might allow for some decoupling.

Feature Request: Support for ERNIE-4.5-VL

I'll try, but I might need help from some competent people (@CISC @ngxson) because the model has some pretty atypical tensor configurations (there are double expert layers basically, a big...

Feature Request: Support for ERNIE-4.5-VL

Also, just FYI: vLLM does not support the visual component yet, for exactly the same reason: the implementation is very complex. As far as I know, the support for the...

`server`: streaming of tool calls and thoughts when `--jinja` is on

Seems there's a bug in the current code version with executing streaming tools under reasoning models. I'm trying it with Qwen3 and the following sequence causes a server crash: *...

Seamless texture generation support for qwen image, z-image, and flux

@ggerganov any chance for a GGML sync?

Feature Request:

> So, if I understand correctly what you want is to enable tool handling even when `tools` is not provided? I think he wants to enable tool handling when tools...

Feature Request:

So basically, the feature request, if I understand it correctly, is "properly handle cases where tool definitions are hardcoded in the template instead of passed via the tools parameter at...