llama-cpp-python
llama-cpp-python copied to clipboard
Thinking toggle support for Qwen related models
Is your feature request related to a problem? Please describe. Cannot toggle thinking in Qwen models, when we do it through the user prompt way, it still gives out opening and closing think tags.
Describe the solution you'd like Support for params/ flags like reasoning_budget, chat_template_kwargs parameters. Describe alternatives you've considered N/A
Additional context This problem has been a long point of discussion in the upstream project (llama.cpp) too and I am hoping for it to trickle down here and have support for it, noting that the PRs have been merged and llama.cpp has controls over thinking now.