glow icon indicating copy to clipboard operation
glow copied to clipboard

[IROptimizer] Further remove useless copy instructions

Open mciprian13 opened this issue 3 years ago • 8 comments

One such example we can optimize is for this graph: Concat(inp1, inp2, ..., inpN) For the particular case when the Concat node concatenates contiguous slices (e.g. concatenation is done along the 1st dimension) the operator translates at IR level in copy instructions (each input slice is copied as a contiguous block in the output buffer). Instead what we could do to remove these copy instructions would be to modify the IR such that the producers of inp1, inp2 ... would write directly in the slice portions of the final Concat output buffer.

mciprian13 avatar Mar 03 '21 19:03 mciprian13

@opti-mix Do know how to do this easily and elegantly? I wasted 2 days in trying to make an IR optimization pass which does the above but I ended nowhere ... it just seems too complicated. I need this optimization pass for a particular benchmark which gets unattractive because of the overhead of these useless copies.

mciprian13 avatar Mar 03 '21 19:03 mciprian13

@mciprian13 Just to check that I understand you correctly. You'd like to modify the IR so that the produces of inpN write into a tensorview, which is the slice of the final Concat buffer, correct?

IIRC, Glow has a similar optimization in the IROptimizer already called optimizeInserts, no? But maybe it is not general enough for your case.

opti-mix avatar Mar 03 '21 19:03 opti-mix

@opti-mix Yeah I`ve seen that optimization but it seems it is not generic enough and does not kick in for my case. https://github.com/pytorch/glow/blob/39a8c689f252076ff5842c1870523b420e509b72/lib/Optimizer/IROptimizer/IROptimizer.cpp#L1347-L1352

This optimization is used only if the InsertTensor has an allocActivation as a source. In my case the source of the InsertTensors are TensorViews because the concat inputs where Reshape nodes. I guess later in the IR optimizer pipeline the InsertTensor are transformed into Copy instructions. Basically what I end up with is copy instructions having TensorViews as both input and output. So basically it would be nice to have some sort of optimization to remove Copy instructions more aggressively.

mciprian13 avatar Mar 04 '21 14:03 mciprian13

@mciprian13 I see. Yes, seems like a generalized copy elimination that is aware of tensor views would be needed. Do you happen to have a small instruction IR-level unit test to reproduce the issue? It could be useful while thinking about a solution.

opti-mix avatar Mar 04 '21 16:03 opti-mix

@opti-mix I can provide a model for which this happens: IROptModel.zip You can find in the archive:

  • A MobileNet SSD model in ONNX format (model publicly available taken from ONNX zoo)
  • The CLI command to compile using the model-compiler Glow tool and dump the Glow IR:
    model-compiler -backend=CPU -model=mobilenet_v1_0.75_ssd.onnx -emit-bundle=bundle -dump-ir > model_ir.txt
    
  • The IR file model_ir.txt dumped by the above command

In the IR file model_ir.txt you can find 12 x Copy instructions which are very expensive when everything else in the graph is executed by a powerful accelerator. You will also see that all those Copy instruction have TensorViews as both input and output. Let me know what solution you would see to this. Thanks!

mciprian13 avatar Mar 04 '21 17:03 mciprian13

@opti-mix Did you have time to investigate this optimization?

mciprian13 avatar Apr 09 '21 16:04 mciprian13

@mciprian13 Sorry, I was busy with some other urgent stuff. Haven't spent any reasonable time on this yet.

opti-mix avatar Apr 09 '21 16:04 opti-mix

@opti-mix Ok no problem. Btw do you think it is worth organizing some meetings with all the Glow contributors to exchange/share ideas about the Glow future, identify groups of people with commons interests and which could collaborate, or maybe other purposes? WDYT?

mciprian13 avatar Apr 09 '21 16:04 mciprian13