Ben Vanik
Ben Vanik
yeah those may be unrelated, the conv ones are element diffs though - @ScottTodd can probably provide the files/instructions (may be close to landing https://github.com/openxla/iree/issues/16372#issuecomment-1965464703)
whoa, first legit find from the suite and it hasn't landed yet! *high five*
thanks for the repro it's much easier to see what we need to do before scheduling execution and nesting things - ElideAsyncCopies runs very early on in the pipeline.
your "linalg level llama IR" seems to be torch, and running it through iree-compile doesn't seem to convert out of torch? do you have compile commands that work with that?...
There are some failing gather tests too, perhaps they are useful? ``` FAILED iree_tests/onnx/node/generated/test_gather_0/model.mlir::test_gather_0 FAILED iree_tests/onnx/node/generated/test_gather_1/model.mlir::test_gather_1 FAILED iree_tests/onnx/node/generated/test_gather_2d_indices/model.mlir::test_gather_2d_indices FAILED iree_tests/onnx/node/generated/test_gather_elements_negative_indices/model.mlir::test_gather_elements_negative_indices ```
actually, I don't care - can you just post the results of an iree-compile --compile-to=flow? that's a better starting point before dealing with stream passes (the before allocation one you...
which link?
neat, elide async copies is already getting rid of the clones on each variable update: ```mlir %869 = arith.muli %_global_seq_step.global, %c8192 : index %870 = stream.async.slice %63[%c0 to %869] :...
ok tweaked emplace allocations, so now the updates are slow memcpyed into place: ```mlir %1818 = arith.addi %869, %c8192 : index %1819 = stream.async.dispatch @run_forward_dispatch_804::@run_forward_dispatch_804_slow_memcpy[%_global_seq_step.global, %1019](%_global_seq_step.global, %1024[%c0 to %1020 for...
good idea - a pass converting out of that form to the same kind of thing you're doing with onnx seems like the minimal change set and pattern for even...