Sebastian Berg
Sebastian Berg
> I think it is not suitable to do it without designated initializers { .tp_name = val } I think they are allowed to be written as long as they...
No, `unicode_arrtype_as_buffer` need not exisst at all.
> seberg comment about assigning a pointer to a templated struct. I don't know how to do it Maybe check out what `jax/ml_dtypes` is doing and see if you get...
Hmmm, the profiling above was maybe with few iterations or so (it's tricky to get nice profiles, since this doesn't take long but if you do too many iterations cuda...
FWIW, we already changed the code with the initial suggestion a while ago. I am not sure there is anything actionable left here?
Looks good to me, seems like matmul in `cupy/_core/_routines_linalg.pyx` uses the cimported `ascontiguousarray` for `dtype` support (and that looks like the only occurance). I'll wager that this function is heavy...
> In my use-case, it's not difficult to work around. But I thought maybe this could maybe cause some performance issues or something 🤷🏻. No, it's related to SIMD dispatching...
One other thing _might_ be to actually distinguish the two use-cases more explicitly. In the kernel use-case the consumer could just use the stream provided (maybe unless the user passes...
Sorry yeah, always confused me too until I looked at it more often: the cuda array interface (version 3: https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html).