Piotr Wilkin (ilintar)
Piotr Wilkin (ilintar)
I'm sorry, been really busy lately but I promise I'll take a look soon!
Feature Request: Support for Microsoft's Phi-4-mini-flash-reasoning and Nvidia's Nemotron-nano-9b-v2
Nemotron is already supported.
Feature Request: Support for Microsoft's Phi-4-mini-flash-reasoning and Nvidia's Nemotron-nano-9b-v2
> > Nemotron is already supported. > > I am trying to find GGUF for this but was not successful. Can you help with this please? Thanks! https://huggingface.co/bartowski/nvidia_NVIDIA-Nemotron-Nano-12B-v2-GGUF https://huggingface.co/bartowski/nvidia_NVIDIA-Nemotron-Nano-9B-v2-GGUF
@gabe-l-hart as a side note, I've added `cumsum` and `tri` as new ops during the Qwen3Next implementation, so that might allow for some decoupling.
I'll try, but I might need help from some competent people (@CISC @ngxson) because the model has some pretty atypical tensor configurations (there are double expert layers basically, a big...
Also, just FYI: vLLM does not support the visual component yet, for exactly the same reason: the implementation is very complex. As far as I know, the support for the...
Seems there's a bug in the current code version with executing streaming tools under reasoning models. I'm trying it with Qwen3 and the following sequence causes a server crash: *...
@ggerganov any chance for a GGML sync?
> So, if I understand correctly what you want is to enable tool handling even when `tools` is not provided? I think he wants to enable tool handling when tools...
So basically, the feature request, if I understand it correctly, is "properly handle cases where tool definitions are hardcoded in the template instead of passed via the tools parameter at...