Use torch.compile with Minicpm-o

Open janak2 opened this issue 9 months ago • 1 comments

Hi, Is it possible to use torch.compile (https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) to speed up inference from Minicpm-o streaming audio. This is the code we are using for inference rn:

`
self.model = ( AutoModel.from_pretrained( "openbmb/MiniCPM-o-2_6", trust_remote_code=True, attn_implementation="sdpa", torch_dtype=torch.bfloat16, ) .eval() .to(device) ) self._tokenizer = AutoTokenizer.from_pretrained( "openbmb/MiniCPM-o-2_6", trust_remote_code=True, revision=model_revision ) self.init_tts()

.....

response_generator = self.model.streaming_generate( session_id=self.session_id, tokenizer=self._tokenizer, temperature=self.config.temperature, generate_audio=self._generate_audio, ) `

Mar 05 '25 05:03 janak2

Thank you for your usage, torch.compile does have some room for compilation optimization and acceleration. We haven't actually tested it, so you could give it a try. We’d really appreciate it if you could share your findings.

May 15 '25 14:05 YuzaChongyi