onnx-mlir icon indicating copy to clipboard operation
onnx-mlir copied to clipboard

Better mapping of zAIU ops

Open AlexandreEichenberger opened this issue 1 year ago • 0 comments

Our current mapping of ONNX to zAIU ops is as follow

  1. Map all zAIU-eligible operations to zAIU greedily, adding stick/unstick for input/outputs while eventually removing redundant stick/unstick or unsick/stick pairs.
  2. Migrate zAIU add/sub/mul/div ops back to CPU if they use the stick -> zAIU (add/sub/mul/div) -> unstick as this allow us to get rid of a stick/unstick pair. This is currently only done under optional flag --enable-zhigh-to-onnx flag.

One issue with this approach is that we often map to the zAIU operations that are too small to be beneficially exploited on the accelerator. This can be determined by a cost model that estimate for each operation (individually) if it's better on the CPU or the accelerator.

While I agree that in general, a graph partitioning algorithm might be the most flexible approach, I wonder if we can improve our greedy algorithm to do better by a simple modification.

  1. Map beneficial zAIU-eligible operations on zAIU greedily, adding stick/unstick for input/outputs while eventually removing redundant stick/unstick or unsick/stick pairs.
  2. Migrate zAIU-eligible CPU ops to the accelerator regardless of cost model if they exhibit this pattern: unstick -> CPU op -> stick as it allows us to remove unstick/stick pair
  3. Migrate zAIU add/sub/mul/div ops back to CPU if they use the stick -> zAIU (add/sub/mul/div) -> unstick as this allow us to get rid of a stick/unstick pair.

Possibly doing the Steps 2 & 3 together. Additionally, in Step 1, we should remove the unstick/stick in unstick-> CPU op->stick for op that can be performed on the CPU in stickified format (thinking of all the data movement ops).

AlexandreEichenberger avatar Sep 11 '23 20:09 AlexandreEichenberger