Almog Segal
Almog Segal
@manjugv please review.
Below is how my output looks like when I run with 16 processes, it makes it harder to find what I need in the output when I run with more...
After some debugging, I found out that cuSPARSE isn't being called. The `from_scipy_sparse` function sets `indices_sorted=False` and then in `_bcoo_dot_general_gpu_lowering` the `_bcoo_dot_general_default_lowering` function is being called based on the `indices_sorted`...
@jakevdp do have a suggestion what may be wrong in _bcoo_dot_general_impl? Is it the same algorithm that is being used for TPUs/CPUs?
@manjugv it definitely does. Thank you! For long term, I think it would be nice to be able to query the context as I suggest so libraries and other users...
@manjugv FYI.
> We discussed this in our WG. When you don’t give OOB, we need Allgather in the implementation and this feature is lacking in the implementation. This requires a lot...
The use-case for that is to be able perform a "broadcast" on row/col comms without having to create these comms. For linear algebra functionality, when you use 2DBC data layout,...