Use torch.compile with Minicpm-o
Hi, Is it possible to use torch.compile (https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) to speed up inference from Minicpm-o streaming audio. This is the code we are using for inference rn:
`
self.model = (
AutoModel.from_pretrained(
"openbmb/MiniCPM-o-2_6",
trust_remote_code=True,
attn_implementation="sdpa",
torch_dtype=torch.bfloat16,
)
.eval()
.to(device)
)
self._tokenizer = AutoTokenizer.from_pretrained(
"openbmb/MiniCPM-o-2_6", trust_remote_code=True, revision=model_revision
)
self.init_tts()
.....
response_generator = self.model.streaming_generate( session_id=self.session_id, tokenizer=self._tokenizer, temperature=self.config.temperature, generate_audio=self._generate_audio, ) `
Thank you for your usage, torch.compile does have some room for compilation optimization and acceleration. We haven't actually tested it, so you could give it a try. We’d really appreciate it if you could share your findings.