Piotr Wilkin (ilintar)

Results 77 comments of Piotr Wilkin (ilintar)

I'm sorry, been really busy lately but I promise I'll take a look soon!

> > Nemotron is already supported. > > I am trying to find GGUF for this but was not successful. Can you help with this please? Thanks! https://huggingface.co/bartowski/nvidia_NVIDIA-Nemotron-Nano-12B-v2-GGUF https://huggingface.co/bartowski/nvidia_NVIDIA-Nemotron-Nano-9B-v2-GGUF

@gabe-l-hart as a side note, I've added `cumsum` and `tri` as new ops during the Qwen3Next implementation, so that might allow for some decoupling.

I'll try, but I might need help from some competent people (@CISC @ngxson) because the model has some pretty atypical tensor configurations (there are double expert layers basically, a big...

Also, just FYI: vLLM does not support the visual component yet, for exactly the same reason: the implementation is very complex. As far as I know, the support for the...

Seems there's a bug in the current code version with executing streaming tools under reasoning models. I'm trying it with Qwen3 and the following sequence causes a server crash: *...

> So, if I understand correctly what you want is to enable tool handling even when `tools` is not provided? I think he wants to enable tool handling when tools...

So basically, the feature request, if I understand it correctly, is "properly handle cases where tool definitions are hardcoded in the template instead of passed via the tools parameter at...