executorch Qualcomm AI Engine Direct

Summary

Remove alias_copy op.

Test plan

Add UTs to ensure alias_copy is removed.

Apr 21 '25 03:04 winskuo-quic

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10319

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:x: 1 New Failure

As of commit 4b587c0fed1754fd5a492d34ddf6825daf50a51e with merge base 4e38f4ac541d0e81cd7b8f62660010eb7ca09a5e ():

NEW FAILURE - The following job has failed:

pull / test-llava-runner-linux / linux-job (gh) RuntimeError: Command docker exec -t 92c37ffcdda88d206e2c3b3eef3e8691e272d1a47ad1f315c6d306e7b8682a1d /exec failed with exit code 139

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Apr 21 '25 03:04 pytorch-bot[bot]

Hi @cccclai, @billmguo, This PR is the remove alias_copy operation. Please have a look. Thanks

Apr 21 '25 03:04 winskuo-quic

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Apr 22 '25 05:04 facebook-github-bot

It seems like the test is failing

AssertionError: False is not true : ref_output:
tensor([[ 0.5914, -0.8622,  0.0214, -0.3349,  0.0285, -0.9833, -1.2256,  0.3349,
         -0.7197,  0.3278]])

model_output:
tensor([[ 0.5914,  0.5914, -1.2185, -1.2185,  0.5914, -1.2185, -1.2185, -1.2185,
         -1.2185, -1.2185]])

Apr 22 '25 16:04 cccclai

It seems like the test is failing

AssertionError: False is not true : ref_output:
tensor([[ 0.5914, -0.8622,  0.0214, -0.3349,  0.0285, -0.9833, -1.2256,  0.3349,
         -0.7197,  0.3278]])

model_output:
tensor([[ 0.5914,  0.5914, -1.2185, -1.2185,  0.5914, -1.2185, -1.2185, -1.2185,
         -1.2185, -1.2185]])

Thanks for sharing the results. I think this is failing by chances. I tested locally a couple times, and it all passed. I have changed to a simpler OP instead since the purpose here is to verify alias_copy is handled properly. Would you mind if I push another PR for UT refactor to add a new class for passes related stuff later on? I think we will need to discuss internally first on how to refactor the UT classes. Thanks

Apr 23 '25 13:04 winskuo-quic

Actually the failing might due to this change https://github.com/pytorch/executorch/pull/10362, can you help checking if the test is still passing with https://github.com/pytorch/pytorch/pull/151436?

Apr 23 '25 14:04 cccclai

I have been looking at the failure and still haven't made much progress. It looks like qualcomm passes do something special to exported artifact state dicts down the line. Is it accurate? Can you point to me places where you deal with module state dicts?

Apr 23 '25 20:04 tugsbayasgalan

Actually the failing might due to this change #10362, can you help checking if the test is still passing with pytorch/pytorch#151436?

I have been looking at the failure and still haven't made much progress. It looks like qualcomm passes do something special to exported artifact state dicts down the line. Is it accurate? Can you point to me places where you deal with module state dicts?

Hi @cccclai, @tugsbayasgalan,

Yes I am able to reproduce the issue after bumping the torch nightly version to "dev20250422".

I believe the reason is that we are trying to convert the quant weights back to floating point during https://github.com/pytorch/executorch/blob/d31ef13a463d2bd20ec7699bbfdbd570b4f318c4/backends/qualcomm/_passes/annotate_quant_attrs.py#L144 However, after the bumping the torch version, it seems like the FP weights are not properly saved. We are using the following functions to get and set parameters. https://github.com/pytorch/executorch/blob/d31ef13a463d2bd20ec7699bbfdbd570b4f318c4/backends/qualcomm/builders/utils.py#L30 https://github.com/pytorch/executorch/blob/d31ef13a463d2bd20ec7699bbfdbd570b4f318c4/backends/qualcomm/builders/utils.py#L47

It would be appreciated if you could share the best practices on handling parameters modification. Thanks

Apr 24 '25 03:04 winskuo-quic

I am still confused how this logic could break your flow tho. Before returning state dict, i just shallow copy in export meaning they should be the same state dict lol. In what part of set_parameter does it fail?

Apr 24 '25 14:04 tugsbayasgalan

I am still confused how this logic could break your flow tho. Before returning state dict, i just shallow copy in export meaning they should be the same state dict lol. In what part of set_parameter does it fail?

I think I might know why it is not updating, but I need some time to verify if it is actually the root cause as I am currently blocked by something else. I think when we are initializing the passes, we passed the exported_program into the constructor. https://github.com/pytorch/executorch/blob/9e59c198dde0f4d4c4bb714e7f1de3b389cfc577/backends/qualcomm/_passes/qnn_pass_manager.py#L160 Next, during the lowering process, when decomposition is called, we get a new exported_program with a shallow copy new_state_dict. https://github.com/pytorch/pytorch/blob/ad81eeb7c7c906e0cdd04a5cc8fdb9592281c317/torch/export/exported_program.py#L880 Then, when we are running the passes, we are updating our values to old state_dict in the old exported_program. This is why we keep on getting incorrect params during QNN op builder.

Please let me know if there's anything that is still unclear. Thanks

Apr 25 '25 13:04 winskuo-quic

Ahh this makes sense. You should always work with state_dict of ep after running decompositions because the state dict can change after decompositions.

Apr 25 '25 14:04 tugsbayasgalan

@winskuo-quic Just checking in, will you be working on making the qualcomm code compatible with new export behaviour?

Apr 29 '25 15:04 tugsbayasgalan

@winskuo-quic Just checking in, will you be working on making the qualcomm code compatible with new export behaviour?

Hi @tugsbayasgalan, Thanks for the suggestions. We will work on Qualcomm code to make it compatible with new export behavior. As you mentioned, it is probably safer to work on state dict of the ep after decomposition.

Apr 30 '25 03:04 winskuo-quic

Actually the failing might due to this change #10362, can you help checking if the test is still passing with pytorch/pytorch#151436?

Any update on this change? https://github.com/pytorch/executorch/pull/10362 is currently pending due to the breakage

May 12 '25 17:05 cccclai

Actually the failing might due to this change #10362, can you help checking if the test is still passing with pytorch/pytorch#151436?

Any update on this change? #10362 is currently pending due to the breakage

Sorry for the delay on pushing the PR. We came up with couple solutions and were discussing internally which one is the best to resolve the uplevel error. We will share the PR later today.

May 13 '25 01:05 winskuo-quic

Given that CI is green again, we can resume merging PRs now. Mind rebasing? This seems to be the oldest PR that needs to land

May 21 '25 22:05 cccclai

Given that CI is green again, we can resume merging PRs now. Mind rebasing? This seems to be the oldest PR that needs to land

Just rebased. Thanks for reminder

May 22 '25 05:05 winskuo-quic

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

May 22 '25 18:05 facebook-github-bot

Qualcomm AI Engine Direct - alias_copy op

Summary

Test plan

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10319

:x: 1 New Failure