Ben Vanik comments

Results 416 comments of


                                            Ben Vanik

Bad dispatch outputs from SDXL VAE

yeah those may be unrelated, the conv ones are element diffs though - @ScottTodd can probably provide the files/instructions (may be close to landing https://github.com/openxla/iree/issues/16372#issuecomment-1965464703)

Bad dispatch outputs from SDXL VAE

whoa, first legit find from the suite and it hasn't landed yet! *high five*

[stream] eliding async slices

thanks for the repro it's much easier to see what we need to do before scheduling execution and nesting things - ElideAsyncCopies runs very early on in the pipeline.

[stream] eliding async slices

your "linalg level llama IR" seems to be torch, and running it through iree-compile doesn't seem to convert out of torch? do you have compile commands that work with that?...

Bad dispatch outputs from SDXL VAE

There are some failing gather tests too, perhaps they are useful? ``` FAILED iree_tests/onnx/node/generated/test_gather_0/model.mlir::test_gather_0 FAILED iree_tests/onnx/node/generated/test_gather_1/model.mlir::test_gather_1 FAILED iree_tests/onnx/node/generated/test_gather_2d_indices/model.mlir::test_gather_2d_indices FAILED iree_tests/onnx/node/generated/test_gather_elements_negative_indices/model.mlir::test_gather_elements_negative_indices ```

[stream] eliding async slices

actually, I don't care - can you just post the results of an iree-compile --compile-to=flow? that's a better starting point before dealing with stream passes (the before allocation one you...

[stream] eliding async slices

which link?

[stream] eliding async slices

neat, elide async copies is already getting rid of the clones on each variable update: ```mlir %869 = arith.muli %_global_seq_step.global, %c8192 : index %870 = stream.async.slice %63[%c0 to %869] :...

[stream] eliding async slices

ok tweaked emplace allocations, so now the updates are slow memcpyed into place: ```mlir %1818 = arith.addi %869, %c8192 : index %1819 = stream.async.dispatch @run_forward_dispatch_804::@run_forward_dispatch_804_slow_memcpy[%_global_seq_step.global, %1019](%_global_seq_step.global, %1024[%c0 to %1020 for...

Add new framework coverage in system level test suites

good idea - a pass converting out of that form to the same kind of thing you're doing with onnx seems like the minimal change set and pattern for even...