Valentin Churavy

Results 1413 comments of Valentin Churavy

* **#541** * **#539** 👈 (View in Graphite) * `main` This stack of pull requests is managed by Graphite. Learn more about stacking.

Well the only issue is that my benchmarks are angry at me... And I can't reproduce locally...

``` InvalidInstruction: Can't translate llvm instruction: Global variable cannot have Function storage class. Consider setting a proper address space. Original LLVM value: @exception.1.8 = private unnamed_addr constant [10 x i8]...

SparseArrays can be moved to an extension, it was introduced in https://github.com/JuliaGPU/KernelAbstractions.jl/pull/269 StaticArrays is going to be much harder. We need a reliable implementation of a stack allocated array and...

Some more simplification: ``` using AMDGPU nx, ny, nz = 10, 1, 1 Nx, Ny, Nz = 1, 1, 1 """ assume(cond::Bool) Assume that the condition `cond` is true. This...

And again: ```julia using AMDGPU nx, ny, nz = 10, 1, 1 Nx, Ny, Nz = 1, 1, 1 function gpu_kernel_xx!(tensor, Nx::Int64, Ny::Int64; ) workitems = CartesianIndices((10, 1, 1)) blocks...

```julia using AMDGPU nx, ny, nz = 10, 1, 1 Nx, Ny, Nz = 1, 1, 1 function gpu_kernel_xx!(tensor, Nx::Int64, Ny::Int64; ) workitems = CartesianIndices((10, 1, 1)) tI = AMDGPU.threadIdx().x...

The LLVM IR is becoming manageable: ```llvm ; @ /home/vchuravy/src/KernelAbstractions/issue517/repr.jl:6 within `gpu_kernel_xx!` define amdgpu_kernel void @_Z14gpu_kernel_xx_14ROCDeviceArrayI7Float64Ll3ELl1EE5Int64S1_({ i64, i64, i64, i64, i64, i64, i32, i32, i64, i64, i64, i64 } %state,...

Yeah pretty much as far as I can tell this is very cursed. ``` @inbounds workitems[i] # accessing `workitems` triggers miscompilation ``` Yeah that's what I figured out as well,...

My reproducer ended up looking like: ``` function gpu_kernel_xx!(tensor, Nx::Int64, Ny::Int64; ) workitems = CartesianIndices((10, 1, 1)) tI = AMDGPU.threadIdx().x if tI