vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[v1] torch.compile integration explanation

Open youkaichao opened this issue 9 months ago • 1 comments

As we roll out the v1 to the public, we need to explain how we use torch.compile

youkaichao avatar Mar 07 '25 12:03 youkaichao

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

github-actions[bot] avatar Mar 07 '25 12:03 github-actions[bot]

As we roll out the v1 to the public, we need to explain how we use torch.compile

Hey @youkaichao, I observed that in VLM, the language model uses torch compile, but ViT does not, for example in Qwen2VL: Qwen2Model definition has compile decoration:

...
@support_torch_compile(
    dynamic_arg_dims={
        "input_ids": 0,
        # positions is of shape (3, seq_len) if mrope is enabled for qwen2-vl,
        # otherwise (seq_len, ).
        "positions": -1,
        "intermediate_tensors": 0,
        "inputs_embeds": 0,
    })
class Qwen2Model(nn.Module):

    def __init__(self, *, vllm_config: VllmConfig, prefix: str = ""):
        super().__init__()
...

but in the definition of Qwen2VisionTransformer, there is no

...
class Qwen2VisionTransformer(nn.Module):

    def __init__(
        self,
        vision_config: Qwen2VLVisionConfig,
        norm_eps: float = 1e-6,
        quant_config: Optional[QuantizationConfig] = None,
        prefix: str = "",
    ) -> None:
...

I tried using torch compile directly on the Forward function of some modules in ViT, but there was no acceleration effect.

Is it necessary to use torch compile for VLM's ViT? Will it have a speed-up effect? ​​If so, how can it be achieved? Thanks for the advice.

lessmore991 avatar Mar 18 '25 03:03 lessmore991

Is it necessary to use torch compile for VLM's ViT? Will it have a speed-up effect? ​​If so, how can it be achieved? Thanks for the advice.

needs to be investigated. i think @ywang96 has some initial ideas.

youkaichao avatar Mar 22 '25 04:03 youkaichao