Carsten Bauer

Results 99 comments of Carsten Bauer

You want the LLVM IR, i.e. `@device_code_llvm`, right? I get something but then it segfaults as well. ```julia julia> @device_code_llvm dump_module=true call_kernel() [126/239] ; PTX CompilerJob of kernel kernel_wmma_tf32_lowlevel(CuDeviceMatrix{Float32, 1},...

> You want to include `dump_module=true` so that we can process the IR with `llc` outside of Julia. If that still asserts, we can try reducing it to figure out...

Thanks for helping me to debug this Tim! I was using the wrong fragment sizes. I fixed this in the latest commit for which `call_kernel()` now runs through without any...

**TODOs** * [x] add tests * [x] docstrings * [ ] (maybe high-level API)

Note: Tests are failing since they're running under Julia 1.6 (and 1.8 is required for this PR). Anything I can / should do about it on my end @maleadt?

If the tests pass (which they should) we can mark this as non-draft IMHO.

> Should also be rebased so that CI runs. Done (I think).

Came up here again: https://discourse.julialang.org/t/determinant-of-cuda-matrix/77245/3 I mentioned ```julia julia> M = rand(10,10); julia> Mgpu = cu(M); julia> det(M) # cpu -0.10243723110926993 julia> prod(diag(LinearAlgebra.LAPACK.getrf!(M)[1])) # cpu -0.10243723110926993 julia> prod(diag(CUSOLVER.getrf!(Mgpu)[1])) # gpu...

Was going to suggest the same as Steven, i.e. multiplying by the sign of the permutation: My draft implementation was ```julia function det!(m::CuMatOrAdj) X_d, ipiv_d = CUSOLVER.getrf!(m) diags = Array(diag(X_d))...

> PS. This is not "type piracy" because the `CuMatOrAdj` type is specific to this package. I guess he meant that this is type piracy in _his_ code (i.e. outside...