wlruys

Results 22 comments of wlruys

Technically, a no-op unmqr can pass the orthogonality check for the different transpose and side configurations, this might be worth modifying the test to check for.

> I am happy to look, but it is probably good if someone like @e10harvey could look at TPL stuff. You probably meant TPL for CuSolver (not LAPACK)? Yea CUSOLVER...

These should all be resolved. Let me know if further changes are needed so we can test and merge :)

Hmm, seems like there's a problem. I'll check it out.

> Its from the empty (no-TPL) warning function in src/blas/impl/KokkosBlas_geqrf_impl.hpp (and for unmqr). https://github.com/kokkos/kokkos-kernels/pull/1165/files#diff-2c3c052a6623ce505916cdc4b14ffd7c41444390ba679fb08bbf2eef92910298R58 Should I just remove all of the references to non-TPL code and remove those?

Hmm, something still seems broken in the test. I'll reproduce this locally and resolve it.

Unfortunately, it seems like the issue is also on the older ROCm versions. (I installed with 4.3.1 and my cupy config is shown below) For the script above, this produces:...

Thanks for the note! Yeah, I have also noticed that the memory allocations do not release the GIL (even on different devices & streams) either with or without the new...

hmm, nevermind, it seems like it should be. So I'm not sure what is going on in the above timings. https://github.com/cupy/cupy/blob/d8e970ec0403376471ac0f71e006f9623569aa34/cupy_backends/cuda/api/runtime.pyx#L473