Leo Fang

Results 1175 comments of Leo Fang

(EDIT: updated the above `pixi.toml` to better reflect the same intention -- having a cartesian product of build variants -- as in the CuPy's case, https://github.com/prefix-dev/pixi/issues/4139#issuecomment-3518989605).

> Still thinking out loud... sorry for message bombs 😛 > > ```toml > # ... > [package.host-dependencies] > # ... > # = "*" # # for the compiler...

> I think it no longer makes sense to bundle FP16 headers in CuPy packages. Today `cuda_fp16.h` depends on `vector_types.h` etc. which must be installed separately (either in `/usr/local/cuda` etc...

I think this is already done. Feel free to reopen if there is anything missing.

Yes, `cumsum` currently is backed by CUB scan so it's affected by the CCCL issue that you linked to. The CCCL team has a plan to add deterministic scan support...

(On my cell, sorry for brevity.) I noticed this PR will have `sizeof(CArray)>=1040` which is a bit enormous, since the default kernel buffer size is 4096 bytes. I have been...

WIP: https://github.com/asi1024/cupy/pull/3. Turns out it is easier than using variable-length arrays (which would **not** allow us to copy host structs by value as kernel arguments).

It seems we missed to update one last test, I raised a PR here: https://github.com/asi1024/cupy/pull/4