Piotr Wilkin (ilintar)
Piotr Wilkin (ilintar)
Basically, if you want to do what I did and get a quick working, if somewhat UX-unfriendly solution, you can look at what I did in `common_chat_parse_nemotron_v2`.
Yeah, as I said, I'm aware of this, but the Qwen3 Next conversion is proving to be extremely time consuming, to say the least. We don't really need tests on...
> 2. A full repro with the error it's raising would definitely help debug Running `llama-cli -m reference/qwen3_next_500m/Qwen3_Next_500M-8x417M-BF16.gguf -ngl 999 -p "Who are "` yields this weird memory error: ```console...
Now this is an error I haven't expected to encounter: `GGML_ABORT("not enough space in the context's memory pool");`
> The model doesn't seem to have any recurrence layers. This makes the set input fails due to input node not being present in cgraph. How do I allocate the...
> @pwilkin any chance to buy you a coffee?(Paterson etc.) so community able to donate for your efforts. Thank you! Added a buymeacoffee link to my profile (do consider first...
@ngxson Thanks, `scale_bias` was one op I was missing in my endeavors :> I got an LLM to rewrite the internal delta into tensor logic. After a day of manually...
> Honestly I would prefer taking time to understand the mamba/ssm implementation then writing the code manually. Code written by LLM are mostly attempts for 1-to-1 translation from pytorch -->...
Aight, I cleaned up the main graph calculation, now I have to figure out how to include `conv_states_all` in my delta_net function in order to not get the memory error.
> if i may ask you Petter, do you think that managing this model to work will be as hard as some people say? No, it's difficult as there are...