Rui comments

Results 19 comments of

Rui

Simplify device count external API calls

I believe that would be part of the issue, and would be an interesting item to familiarize yourself with all the different APIs. It would likely entail breaking it down...

[RFC] MPMD+SPMD Pipeline Parallelism

We had minimally prototyped it with XLA + SPMD - mimicking communication across heterogeneous SPMD worlds, minus the native distributed pipelining APIs: - SPMD localization (similar functionality to Siyuan's local...

[RFC] MPMD+SPMD Pipeline Parallelism

Thanks @zpcore! > How do you plan to sync input/output between different SPMD stages? In the RFC above, we started focusing on local SPMD worlds that have same number of...

[RFC] MPMD+SPMD Pipeline Parallelism

Sorry for the late response, I had a few other urgent items. Thanks for having a look! I'll start revisiting the item and questions. > Are you proposing to have...

[RFC] MPMD+SPMD Pipeline Parallelism

We had a great discussion, and I really appreciate your feedback, Kevin and Pedro! I will follow up on breaking down some of the items on our side, so it's...

Can't print XLA tensors or call `cpu()`.

I am also seeing this now: ``` #8 0x00007ffcf27eca28 in tsl::AsyncValue::GetTypeInfo (this=0x55555bf9e9c0) at external/xla/xla/tsl/concurrency/async_value.h:475 (gdb) p *this $1 = {static kUnknownTypeId = 0, refcount_ = { = {static _S_alignment =...

Can't print XLA tensors or call `cpu()`.

Hey @ysiraichi, do we have a path forward on this one? It would be great to be able to use CPU locally on the container.

While operator test generates condition input as a parameter instead of a constant

@ManfeiBai, how is it handling for 0th'ed dimension tensors of values 0 or 1? These tensors are, by default, treated as constants: https://github.com/pytorch/xla/blob/e3cf356aaf05c02db1cca0ba19594cca8b85bf7f/configuration.yaml#L113 As it stands, it ends up sinking...

While operator test generates condition input as a parameter instead of a constant

It seems that it deliberately had every test with an iterator tensor value of >= 2. I am investigating on how to fix this, but it would be helpful to...

[RFC] Improved coverage for native distributed collective operations

Thanks @bfolie, this is great. For "Send and Recv", aren't the referenced passes for GPU XLA? I am not sure the motivation there was that Send/Recv was not suitable for...