CUDA delegate: fuse int4 kernel for better performance
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15813
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:x: 4 New Failures
As of commit 3d749b5f98175be40d7156c8f8849ec348e789e9 with merge base da6306f4863f7eb16c27337cd8a42aa9d4ac4be7 ():
NEW FAILURES - The following jobs have failed:
- Lint / lintrunner / linux-job (gh)
- pull / test-moshi-linux / linux-job (gh)
RuntimeError: Command docker exec -t 9907ddb62d97ec7acd7082a1c66e47d548d803a5f431bd36ae431972bb439c5b /exec failed with exit code 1 - Test CUDA Builds / test-model-cuda-e2e (openai, whisper-large-v3-turbo, quantized-int4-tile-packed) / linux-job (gh)
RuntimeError: Command docker exec -t 44e8dc9bb0bd25aeec97801ad7169c81e416b57da216ece37849b4564ac3aa94 /exec failed with exit code 1 - Test CUDA Builds / test-model-cuda-e2e (openai, whisper-small, quantized-int4-tile-packed) / linux-job (gh)
RuntimeError: Command docker exec -t 91c62d8959f3759f41e3a1955ed27434b683a2f04050b290fe7bd7ffb6a70d26 /exec failed with exit code 1
This comment was automatically generated by Dr. CI and updates every 15 minutes.
This PR needs a release notes: label
If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.
To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"
For more information, see https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.
Is it possible to make this a part of Inductor freezing optimization? I think it will benefit all models beyond ET use cases.