Jakub Kuderski comments

Results 83 comments of


                                            Jakub Kuderski

Tracking issue for pad-based encoding

I verified the new code generated by https://github.com/jtuyls/iree/tree/padEncodinge2e-2 and I think it works just like we wanted it to and gives the expected improvements. I tried it on the shapes...

Tracking issue for pad-based encoding

As to why some shapes observe higher speedup than others: it boils down to how many loads of each operand get generated per loop iteration. Padding helps most when each...

Tracking issue for pad-based encoding

Threadtrace confirms that padding helps at the level of buffer load instructions: ![Image](https://github.com/user-attachments/assets/15ec218d-0e43-4a86-9ced-d47f8a04ff9f) ![Image](https://github.com/user-attachments/assets/0128057d-3b1e-46ab-8b7d-fda534b9025e)

Tracking issue for pad-based encoding

omniperf also confirms higher cach bandwidth with padding: ![Image](https://github.com/user-attachments/assets/c1a35150-6ef5-468a-80e9-d686889e2bce) ![Image](https://github.com/user-attachments/assets/e123154c-579e-44dc-87c7-c23ba7cc999f) ![Image](https://github.com/user-attachments/assets/72f4dc75-ef12-4684-8e6d-10d44bb85f09) ![Image](https://github.com/user-attachments/assets/70bb36bc-92e5-41f4-a450-8f05dd4e665d) Commands: ``` rocprof-compute profile --name -d 3 --no-roof -- ~/iree/relass/tools/iree-benchmark-module --device=hip://0 --device_allocator=caching --hip_use_streams=true --module= --benchmark_repetitions=1 rocprof-compute analyze -p...

[SPIR-V] Shared memory limit exceeded for batch matmul dispatch

@Groverkss has a WIP PR for this on the LLVMGPU side here: https://github.com/openxla/iree/pull/16927. Kunwar, could you also take care of the SPIR-V path?

[SPIR-V] Shared memory limit exceeded for batch matmul dispatch

> AFAIU my patch should also take care of SPIRV Can you also add a SPIR-V regression test based on the batch matmul from this issue?

[Codegen][GPU] Add range information to GPU dispatch IDs

> (is there an AMD list for the CLA or should I do that on an individual basis?) Select 'For myself' or something along these lines

[LLVMGPU][ROCm] SDXL int8 fails to compile on gfx90a

cc: @MaheshRavishankar

[WIP] Adding support for opt pass plugins.

This looks very useful, @josemonsalve2 and @CRobeck! Left some comments, mostly coding style.

[WIP] Adding support for opt pass plugins.

> I'm trying to reproduce the CI - Linux x64 bazel / linux_x64_bazel (pull_request), but I cannot find the scripts. > > ``` > ./build_tools/bazel/install_bazelisk.sh 1.21.0 > cp ./build_tools/scripts/fetch_cuda_deps.sh /usr/local/bin...