Ean Garvey

Results 87 comments of Ean Garvey

Thank you for the attention here @ScottTodd. SHARK tank is on its way out in favor of moving model tests/benchmarks to SHARK-Turbine. We should track any integrations of the Turbine...

@gpetters94 is this the model with sdpfa decomposed at torch.fx level? ~~If so, I have an updated repro with the iree_linalg_ext.attention version of CLIP that shows the attention dispatch 42...

Actually, with `--iree-llvmcpu-distribution-size=32` added to my compile CLI, totalling: ``` iree-compile .\stable_diffusion_xl_base_1_0_1024x1024_fp16_vae_decode.mlir --iree-input-type=auto --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --mlir-print-debuginfo=false --mlir-print-op-on-diagnostic=false --iree-opt-strip-assertions=true --verify=false --iree-llvmcpu-target-triple=x86_64-windows-msvc --iree-llvmcpu-target-cpu-features=host --iree-llvmcpu-enable-ukernels=all --iree-llvmcpu-fail-on-out-of-bounds-stack-allocation=false --iree-opt-const-expr-hoisting=False --iree-codegen-linalg-max-constant-fold-elements=9223372036854775807 -o stable_diffusion_xl_base_1_0_1024x1024_fp16_vae_decode_cpu.vmfb --iree-llvmcpu-distribution-size=32 ``` I...

> @monorimet Alright, cool. I'll keep combing through the dispatches to find where the zeroes are coming from. (Should I close this and make another you think?) No, it's ok,...

@gpetters94 seems to be this dispatch throwing out NaNs first `@main_dispatch_244_conv_2d_nchw_fchw_1x128x1024x1024x128x3x3_f16`: ``` hal.executable public @main_dispatch_244 { hal.executable.variant public @embedded_elf_x86_64 target() { hal.executable.export public @main_dispatch_244_conv_2d_nchw_fchw_1x128x1024x1024x128x3x3_f16 ordinal(0) layout(#hal.pipeline.layout) { ^bb0(%arg0: !hal.device): %x,...

Maybe it is a slightly unrealistic short-term goal to get these huge convs working on cpu (correct me if I'm wrong @hanhanW ?) I'll try 512x512 and see if I...

Update: 512x512 vae decode also spits out NaNs at the same dispatch, though it has smaller sizes `@main_dispatch_244_conv_2d_nchw_fchw_1x128x1024x1024x128x3x3`: ``` hal.executable public @main_dispatch_244 { hal.executable.variant public @embedded_elf_x86_64 target() { hal.executable.export public...

Very similar issue on SDXL VAE: ``` python ..\models\turbine_models\custom_models\sdxl_inference\vae.py --compile_to=vmfb --external_weights=safetensors --device=rocm --variant="decode" --precision="fp16" --iree_target_triple=gfx1100 --external_weight_path=stable_diffusion_xl_base_1_0_vae.safetensors C:\V\SHARK-Turbine\turb.env\Lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( C:\V\SHARK-Turbine\turb.env\Lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is...

The attached `out.txt` shows that TileAndDecomposeAttention was the last successful pass, and EliminateEmptyTensors fails first. The IR after TileAndDecomposeAttention shows an address being reused many times in the innermost decomposed...

@MaheshRavishankar FWIW: Followed instructions to run `iree-opt --iree-eliminate-empty-tensors --empty-tensor-to-alloc-tensor --iree-codegen-iree-comprehensive-bufferize fixed.mlir` output of iree-opt (above command): [minimal_attn_elim.mlir.txt](https://github.com/openxla/iree/files/14365087/minimal_attn_elim.mlir.txt) ``` iree-compile .\minimal_attn_elim.mlir --iree-input-type=auto --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-embedded-linker-path=C:\V\iree\build\compiler\bindings\python\iree\compiler\tools\..\_mlir_libs\iree-lld.exe --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --mlir-pass-pipeline-crash-reproducer=./shark_tmp/core-reproducer.mlir --iree-input-type=torch --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false...