[v1] torch.compile integration explanation
As we roll out the v1 to the public, we need to explain how we use torch.compile
👋 Hi! Thank you for contributing to the vLLM project.
💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.
Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.
To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.
🚀
As we roll out the v1 to the public, we need to explain how we use
torch.compile
Hey @youkaichao, I observed that in VLM, the language model uses torch compile, but ViT does not, for example in Qwen2VL: Qwen2Model definition has compile decoration:
...
@support_torch_compile(
dynamic_arg_dims={
"input_ids": 0,
# positions is of shape (3, seq_len) if mrope is enabled for qwen2-vl,
# otherwise (seq_len, ).
"positions": -1,
"intermediate_tensors": 0,
"inputs_embeds": 0,
})
class Qwen2Model(nn.Module):
def __init__(self, *, vllm_config: VllmConfig, prefix: str = ""):
super().__init__()
...
but in the definition of Qwen2VisionTransformer, there is no
...
class Qwen2VisionTransformer(nn.Module):
def __init__(
self,
vision_config: Qwen2VLVisionConfig,
norm_eps: float = 1e-6,
quant_config: Optional[QuantizationConfig] = None,
prefix: str = "",
) -> None:
...
I tried using torch compile directly on the Forward function of some modules in ViT, but there was no acceleration effect.
Is it necessary to use torch compile for VLM's ViT? Will it have a speed-up effect? If so, how can it be achieved? Thanks for the advice.
Is it necessary to use torch compile for VLM's ViT? Will it have a speed-up effect? If so, how can it be achieved? Thanks for the advice.
needs to be investigated. i think @ywang96 has some initial ideas.