CUDA delegate: fuse int4 kernel for better performance

Open mergennachin opened this issue 1 month ago • 3 comments

Nov 13 '25 16:11 mergennachin

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15813

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:x: 4 New Failures

As of commit 3d749b5f98175be40d7156c8f8849ec348e789e9 with merge base da6306f4863f7eb16c27337cd8a42aa9d4ac4be7 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner / linux-job (gh)
pull / test-moshi-linux / linux-job (gh) RuntimeError: Command docker exec -t 9907ddb62d97ec7acd7082a1c66e47d548d803a5f431bd36ae431972bb439c5b /exec failed with exit code 1
Test CUDA Builds / test-model-cuda-e2e (openai, whisper-large-v3-turbo, quantized-int4-tile-packed) / linux-job (gh) RuntimeError: Command docker exec -t 44e8dc9bb0bd25aeec97801ad7169c81e416b57da216ece37849b4564ac3aa94 /exec failed with exit code 1
Test CUDA Builds / test-model-cuda-e2e (openai, whisper-small, quantized-int4-tile-packed) / linux-job (gh) RuntimeError: Command docker exec -t 91c62d8959f3759f41e3a1955ed27434b683a2f04050b290fe7bd7ffb6a70d26 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Nov 13 '25 16:11 pytorch-bot[bot]

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example @pytorchbot label "release notes: none"

For more information, see https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Nov 13 '25 16:11 github-actions[bot]

Is it possible to make this a part of Inductor freezing optimization? I think it will benefit all models beyond ET use cases.

Nov 13 '25 23:11 desertfire

CUDA delegate: fuse int4 kernel for better performance

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15813

:x: 4 New Failures

This PR needs a release notes: label

This PR needs a `release notes:` label