Leo Fang

Results 1175 comments of Leo Fang

New thought, recorded here in case I forget. I am getting more and more concerned with the performance of building a cache... We needed Jitify back then because lots of...

> perhaps the way to go should be: > * Unconditionally include `cupy/cuda_workaround.h` whenever CUB code is compiled > * Avoid calling `jitify._init_module()` in CuPy routines > * Only allow...

> 1. There are a few places where the similar treatment can be done to avoid using Jitify Done in https://github.com/cupy/cupy/pull/8467. I am thinking this issue can be closed, now...

@SnzFor16Min it's not about conda-forge. It's v13.3.0 skipping building the Jitify cache in most cases. If you can help confirm, please check with `pip install "cupy-cuda12x==13.2.0"` and see if it...

> Just to clarify is the pip install meant to be 13.3.0 too? When @SnzFor16Min installed from pip v13.3.0 was installed and it worked (no surprise). I was asking to...

I think we might be able to steal MAGMA's kernels for this purpose 😄

Thanks, @asi1024. I will do a case survey and get back to you. My current concerns would be - Don't we need to at least declare the number of inputs?...

> Maybe we should also allow users to specify options to be passed to NVRTC (c.f. #6670) Yes, I think it was listed as a TODO in #5280 (CuPy JIT...

Adding a quick note here for future reference. A recent offline discussion with the team suggests that [ctypes](https://docs.python.org/3/library/ctypes.html) based type declaration would be favored. Using the cuFFT new callback (#8242)...

@dalcinl We still need some `ctypes` machinery such as annotating pointers, so that we can e.g. register foreign C function signatures. What would be the alternative if we do not...