executorch icon indicating copy to clipboard operation
executorch copied to clipboard

Remove no-op _clone_dim_order ops

Open GregoryComer opened this issue 1 month ago • 5 comments

Clean up _clone_dim_order ops in the graph which don't change dim order.

cc @JacobSzwejbka @angelayi

GregoryComer avatar Nov 11 '25 17:11 GregoryComer

You can use RemoveCloneOpsTransform (https://github.com/pytorch/executorch/blob/main/backends/transforms/remove_clone_ops.py) to remove extra clone and dim order clone

Gasoonjia avatar Nov 13 '25 17:11 Gasoonjia

You can use RemoveCloneOpsTransform (https://github.com/pytorch/executorch/blob/main/backends/transforms/remove_clone_ops.py) to remove extra clone and dim order clone

Thanks. Do you have any objection to adding this to the to_edge flow?

GregoryComer avatar Nov 13 '25 17:11 GregoryComer

yeah i belive making it a part of to_edge flow might not be a good idea or not enough, since some "unused" clone-like op will be used after to_backend; removing them may introduce some issue. Also there might be some unused clone-like op being introduced during to_backend, only use that in to_edge stage may not be enough.

Gasoonjia avatar Nov 13 '25 18:11 Gasoonjia

yeah i belive making it a part of to_edge flow might not be a good idea or not enough, since some "unused" clone-like op will be used after to_backend; removing them may introduce some issue. Also there might be some unused clone-like op being introduced during to_backend, only use that in to_edge stage may not be enough.

Can you elaborate a little more on this? Removing the no-ops before calling the partitioners should be effectively invisible to everything downstream. In to_backend, the partitioners and lowering code are bound to not alter the graph outside of the partition.

My largest concern is actually non-functional custom ops, since this is the only way the optimization could affect the graph semantics. Though, from discussion, this likely isn't a big concern.

Specifically, there is a bit of perf impact currently that I want to mitigate from this. I could add it to the XNNPACK backend (and other backends), but it seems nicer to just optimize graph when going to edge dialect.

GregoryComer avatar Nov 13 '25 19:11 GregoryComer

I see there was some discussion around this in https://github.com/pytorch/executorch/pull/12974. Is that what you're referring to? From my understanding, the risk is specifically around removing clones which modify dim order. So we should be safe so long as we're only stripped no-op clones (no dim order change). Is that in line with your understanding?

GregoryComer avatar Nov 13 '25 19:11 GregoryComer

Solved for XNNPACK, at least, so main issue is done. Still hoping to land the optimization for core.

GregoryComer avatar Dec 02 '25 17:12 GregoryComer