iree
iree copied to clipboard
[LLVMGPU] Fix prefetching pass for nested loops
The prefetch pass assumes that shared memory can be reused in the prologue. This may not be true when nested loops are involved, so we need to explicitly insert a barrier to ensure we can reuse this memory.