glow [IROptimizer] Further remove useless copy instructions

One such example we can optimize is for this graph: Concat(inp1, inp2, ..., inpN) For the particular case when the Concat node concatenates contiguous slices (e.g. concatenation is done along the 1st dimension) the operator translates at IR level in copy instructions (each input slice is copied as a contiguous block in the output buffer). Instead what we could do to remove these copy instructions would be to modify the IR such that the producers of inp1, inp2 ... would write directly in the slice portions of the final Concat output buffer.

Mar 03 '21 19:03 mciprian13

@opti-mix Do know how to do this easily and elegantly? I wasted 2 days in trying to make an IR optimization pass which does the above but I ended nowhere ... it just seems too complicated. I need this optimization pass for a particular benchmark which gets unattractive because of the overhead of these useless copies.

Mar 03 '21 19:03 mciprian13

@mciprian13 Just to check that I understand you correctly. You'd like to modify the IR so that the produces of inpN write into a tensorview, which is the slice of the final Concat buffer, correct?

IIRC, Glow has a similar optimization in the IROptimizer already called optimizeInserts, no? But maybe it is not general enough for your case.

Mar 03 '21 19:03 opti-mix

@opti-mix Yeah I`ve seen that optimization but it seems it is not generic enough and does not kick in for my case. https://github.com/pytorch/glow/blob/39a8c689f252076ff5842c1870523b420e509b72/lib/Optimizer/IROptimizer/IROptimizer.cpp#L1347-L1352

This optimization is used only if the InsertTensor has an allocActivation as a source. In my case the source of the InsertTensors are TensorViews because the concat inputs where Reshape nodes. I guess later in the IR optimizer pipeline the InsertTensor are transformed into Copy instructions. Basically what I end up with is copy instructions having TensorViews as both input and output. So basically it would be nice to have some sort of optimization to remove Copy instructions more aggressively.

Mar 04 '21 14:03 mciprian13

@mciprian13 I see. Yes, seems like a generalized copy elimination that is aware of tensor views would be needed. Do you happen to have a small instruction IR-level unit test to reproduce the issue? It could be useful while thinking about a solution.

Mar 04 '21 16:03 opti-mix

@opti-mix I can provide a model for which this happens: IROptModel.zip You can find in the archive:

A MobileNet SSD model in ONNX format (model publicly available taken from ONNX zoo)

The CLI command to compile using the model-compiler Glow tool and dump the Glow IR:

model-compiler -backend=CPU -model=mobilenet_v1_0.75_ssd.onnx -emit-bundle=bundle -dump-ir > model_ir.txt

The IR file model_ir.txt dumped by the above command

In the IR file model_ir.txt you can find 12 x Copy instructions which are very expensive when everything else in the graph is executed by a powerful accelerator. You will also see that all those Copy instruction have TensorViews as both input and output. Let me know what solution you would see to this. Thanks!

Mar 04 '21 17:03 mciprian13

@opti-mix Did you have time to investigate this optimization?

Apr 09 '21 16:04 mciprian13

@mciprian13 Sorry, I was busy with some other urgent stuff. Haven't spent any reasonable time on this yet.

Apr 09 '21 16:04 opti-mix

@opti-mix Ok no problem. Btw do you think it is worth organizing some meetings with all the Glow contributors to exchange/share ideas about the Glow future, identify groups of people with commons interests and which could collaborate, or maybe other purposes? WDYT?

Apr 09 '21 16:04 mciprian13

glow glow copied to clipboard

[IROptimizer] Further remove useless copy instructions

glow
glow copied to clipboard