Leo Fang
Leo Fang
Friendly nudge @di 😅
Tested with the latest Cython (3.0.10) and it still crashed the Cython compiler... 😢
(Tentatively assigned to Federico as per our offline discussion 🙂)
> This is a blocker for nvmath-python to get rid of [its mandatory dependency on CuPy](https://github.com/NVIDIA/nvmath-python/blob/7c485842d0f3300e03ec780056936503913910fe/requirements/pip/nvmath-python-cu11.txt#L1) (so that CuPy can in turn depend on nvmath-python, without hitting circular dependency issues)....
> This is necessarily a host API and NVRTC can't compile host-code. We have a C library now, don't we? 🙂 @jrhemstad @gevtushenko Correct me if I am wrong since...
Another reason for NVRTC compatibility: I think to unblock nvmath-python, we should just focus on the D2D copies (between potentially two different memory layouts) for now, and let nvmath-python handles...
I believe you are right. We should think of this new copy routine as if it were a CUB device-wide algorithm. What I originally had in mind is really just...
FYI, Apple MLX counterpart: https://github.com/ml-explore/mlx/pull/1421
@fbusato is this still on your radar, or should we find someone else to take over?
Sounds good, thanks Federico! Let's keep this on your plate then 🙂 A quick update on this: The nvmath-python team needs to be unblocked asap, so they've been looking into...