Rui
Rui
I believe that would be part of the issue, and would be an interesting item to familiarize yourself with all the different APIs. It would likely entail breaking it down...
We had minimally prototyped it with XLA + SPMD - mimicking communication across heterogeneous SPMD worlds, minus the native distributed pipelining APIs: - SPMD localization (similar functionality to Siyuan's local...
Thanks @zpcore! > How do you plan to sync input/output between different SPMD stages? In the RFC above, we started focusing on local SPMD worlds that have same number of...
Sorry for the late response, I had a few other urgent items. Thanks for having a look! I'll start revisiting the item and questions. > Are you proposing to have...
We had a great discussion, and I really appreciate your feedback, Kevin and Pedro! I will follow up on breaking down some of the items on our side, so it's...
I am also seeing this now: ``` #8 0x00007ffcf27eca28 in tsl::AsyncValue::GetTypeInfo (this=0x55555bf9e9c0) at external/xla/xla/tsl/concurrency/async_value.h:475 (gdb) p *this $1 = {static kUnknownTypeId = 0, refcount_ = { = {static _S_alignment =...
Hey @ysiraichi, do we have a path forward on this one? It would be great to be able to use CPU locally on the container.
@ManfeiBai, how is it handling for 0th'ed dimension tensors of values 0 or 1? These tensors are, by default, treated as constants: https://github.com/pytorch/xla/blob/e3cf356aaf05c02db1cca0ba19594cca8b85bf7f/configuration.yaml#L113 As it stands, it ends up sinking...
It seems that it deliberately had every test with an iterator tensor value of >= 2. I am investigating on how to fix this, but it would be helpful to...
Thanks @bfolie, this is great. For "Send and Recv", aren't the referenced passes for GPU XLA? I am not sure the motivation there was that Send/Recv was not suitable for...