Leo Fang
Leo Fang
(EDIT: updated the above `pixi.toml` to better reflect the same intention -- having a cartesian product of build variants -- as in the CuPy's case, https://github.com/prefix-dev/pixi/issues/4139#issuecomment-3518989605).
> Still thinking out loud... sorry for message bombs 😛 > > ```toml > # ... > [package.host-dependencies] > # ... > # = "*" # # for the compiler...
> I think it no longer makes sense to bundle FP16 headers in CuPy packages. Today `cuda_fp16.h` depends on `vector_types.h` etc. which must be installed separately (either in `/usr/local/cuda` etc...
I think this is already done. Feel free to reopen if there is anything missing.
Yes, `cumsum` currently is backed by CUB scan so it's affected by the CCCL issue that you linked to. The CCCL team has a plan to add deterministic scan support...
(On my cell, sorry for brevity.) I noticed this PR will have `sizeof(CArray)>=1040` which is a bit enormous, since the default kernel buffer size is 4096 bytes. I have been...
WIP: https://github.com/asi1024/cupy/pull/3. Turns out it is easier than using variable-length arrays (which would **not** allow us to copy host structs by value as kernel arguments).
/test mini
It seems we missed to update one last test, I raised a PR here: https://github.com/asi1024/cupy/pull/4
/test mini