How to turn off the thinking mode
How to turn off the thinking mode training if I don't want to use it with the Qwen3 series models?
So you want to remove the thinking tokens from the response for training? I think that will cause discrepency between training and inference.
I'm also interested. There is a enable_thinking parameter in huggingface apply_chat_template function for Qwen3 models but I did not know where to pass it within agent-lightning/verl. I suppose that if we put the same value in training and inference this would not cause any discrepancy
Has verl figured that out? If they haven't, we are unable to help despite we want to.