Sudhakar Singh
Sudhakar Singh
@Numeri is this resolved?
Closing since duplicate
@qixuanf was this resolved?
Closing since no response. (feel free to open to again if the issue isn't resolved at your end)
Rerun the repro code as follows (on google colab). It seems like the `loop` vs the `lax_f_scan` perform similarly. (Although there is a perf difference b/w CPU vs GPU but...
@holl- was this resolved?
@yiiyama maybe you could also try debugging with [nccl-tests](https://github.com/NVIDIA/nccl-tests)
@dionhaefner Was this resolved? Do you still need help?
@lee-van-oetz is this fixed now?
@rwightman can we consider this resolved?