AOTInductor cpp_wrapper: fix output code interception
Stack from ghstack (oldest at bottom):
- #140620
- #141176
- #141175
- -> #141174
Ensure that only the second run of output code generation on GPU actually gets returned. This fixes cases where a single FW and BW pass are assumed.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/141174
- :page_facing_up: Preview Python docs built from this PR
- :page_facing_up: Preview C++ docs built from this PR
- :question: Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours
Note: Links to docs will display an error until the docs builds have been completed.
:heavy_exclamation_mark: 1 Active SEVs
There are 1 currently active SEVs. If your PR is affected, please view them below:
:x: 1 New Failure
As of commit 645e38bed04dc65cfcabcf407a30af67d2264ebd with merge base 740d1eb0306f1f9d0ce81ea81f287a6b52738fab ():
NEW FAILURE - The following job has failed:
- inductor / unit-test / cuda12.1-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor.py::GPUTests::test_conv_inference_heuristics_cuda
This comment was automatically generated by Dr. CI and updates every 15 minutes.
@desertfire I'm not entirely sure whether this passes CI yet or not (new failures keep popping up), but I wanted your input on the approach of this PR. It seems to be six of one, half a dozen of another. Either:
a) we only log the output code from the final run of the GPU cpp_wrapper codegen, and then have to update all the tests checking for triton-specific code in the output, or
b) we log the output code from both runs, and then have to update all places that assume only a single kernel's worth of forward and backward pass code will be returned.
I am recycling my old PR on the one-pass implementation. I think you can work on other issues and waiting for my PR to land for this one. I will link my PR here when it's ready.
@desertfire Sounds good! I'll rebase this out of the stack and hopefully everything else will pass.