InternVL issue on activating thinking mode with lmdeploy

Hi, thanks for sharing the InternVL3.5 series! The thinking mode can be activated by setting the system prompt when inferecing with transformers, but how should it be done when running offline inference with lmdeploy (using the pipeline method in lmdeploy)? To be more specific, how should I change the following code to activate thinking mode?

from lmdeploy import pipeline, PytorchEngineConfig from lmdeploy.vl import load_image pipe = pipeline("OpenGVLab/InternVL3_5-30B-A3B", backend_config=PytorchEngineConfig(session_len=max_length, tp=2)) input = [(prompt, load_image(image_url))] responses = pipe(input)

Sep 04 '25 10:09 yijunCai

content = [{'type': 'text', 'text': question}] messages = [dict(role='system', content=R1_SYSTEM_PROMPT), dict(role='user', content=content)] out = pipe(messages, gen_config=GenerationConfig(do_sample=True, temperature=0.6 ))
print(out.text)

Sep 05 '25 08:09 JoshonSmith

content = [{'type': 'text', 'text': question}] messages = [dict(role='system', content=R1_SYSTEM_PROMPT), dict(role='user', content=content)] out = pipe(messages, gen_config=GenerationConfig(do_sample=True, temperature=0.6 )) print(out.text)

Ok this works, thanks! I also have a subsequent question: I want to get the logits of the final output (after "<//think>"). I have tried setting gen_config=GenerationConfig(output_logits='generation') (also with the recommended (do_sample=True, temperature=0.6) for thinking mode) following the lmdeploy doc, but the logits from pipe() response.logits seems to be None. How should I get the final output logits properly?

Sep 11 '25 03:09 yijunCai