Jakub Kuderski
Jakub Kuderski
I verified the new code generated by https://github.com/jtuyls/iree/tree/padEncodinge2e-2 and I think it works just like we wanted it to and gives the expected improvements. I tried it on the shapes...
As to why some shapes observe higher speedup than others: it boils down to how many loads of each operand get generated per loop iteration. Padding helps most when each...
Threadtrace confirms that padding helps at the level of buffer load instructions:  
omniperf also confirms higher cach bandwidth with padding:     Commands: ``` rocprof-compute profile --name -d 3 --no-roof -- ~/iree/relass/tools/iree-benchmark-module --device=hip://0 --device_allocator=caching --hip_use_streams=true --module= --benchmark_repetitions=1 rocprof-compute analyze -p...
@Groverkss has a WIP PR for this on the LLVMGPU side here: https://github.com/openxla/iree/pull/16927. Kunwar, could you also take care of the SPIR-V path?
> AFAIU my patch should also take care of SPIRV Can you also add a SPIR-V regression test based on the batch matmul from this issue?
> (is there an AMD list for the CLA or should I do that on an individual basis?) Select 'For myself' or something along these lines
cc: @MaheshRavishankar
This looks very useful, @josemonsalve2 and @CRobeck! Left some comments, mostly coding style.
> I'm trying to reproduce the CI - Linux x64 bazel / linux_x64_bazel (pull_request), but I cannot find the scripts. > > ``` > ./build_tools/bazel/install_bazelisk.sh 1.21.0 > cp ./build_tools/scripts/fetch_cuda_deps.sh /usr/local/bin...