Remove no-op _clone_dim_order ops
Clean up _clone_dim_order ops in the graph which don't change dim order.
cc @JacobSzwejbka @angelayi
You can use RemoveCloneOpsTransform (https://github.com/pytorch/executorch/blob/main/backends/transforms/remove_clone_ops.py) to remove extra clone and dim order clone
You can use RemoveCloneOpsTransform (https://github.com/pytorch/executorch/blob/main/backends/transforms/remove_clone_ops.py) to remove extra clone and dim order clone
Thanks. Do you have any objection to adding this to the to_edge flow?
yeah i belive making it a part of to_edge flow might not be a good idea or not enough, since some "unused" clone-like op will be used after to_backend; removing them may introduce some issue. Also there might be some unused clone-like op being introduced during to_backend, only use that in to_edge stage may not be enough.
yeah i belive making it a part of to_edge flow might not be a good idea or not enough, since some "unused" clone-like op will be used after to_backend; removing them may introduce some issue. Also there might be some unused clone-like op being introduced during to_backend, only use that in to_edge stage may not be enough.
Can you elaborate a little more on this? Removing the no-ops before calling the partitioners should be effectively invisible to everything downstream. In to_backend, the partitioners and lowering code are bound to not alter the graph outside of the partition.
My largest concern is actually non-functional custom ops, since this is the only way the optimization could affect the graph semantics. Though, from discussion, this likely isn't a big concern.
Specifically, there is a bit of perf impact currently that I want to mitigate from this. I could add it to the XNNPACK backend (and other backends), but it seems nicer to just optimize graph when going to edge dialect.
I see there was some discussion around this in https://github.com/pytorch/executorch/pull/12974. Is that what you're referring to? From my understanding, the risk is specifically around removing clones which modify dim order. So we should be safe so long as we're only stripped no-op clones (no dim order change). Is that in line with your understanding?
Solved for XNNPACK, at least, so main issue is done. Still hoping to land the optimization for core.