Yeounoh Chung comments

Results 49 comments of


                                            Yeounoh Chung

Lower `randperm`

Re-assigning the issue to @vanbasten23 .

[SPMD][PoC] compile & execute with PjRt

Going to breakdown this PR into smaller ones and land them separately. Some of the unit tests need to land in the end, after landing all the changes cc @JackCaoG...

[SPMD][PoC] compile & execute with PjRt

cc @jonb377 FYI

[SPMD][PoC] compile & execute with PjRt

@JackCaoG could you help review this PR, this would be the last one of the three PoC PRs. We need this to unblock @steventk-g for his virtual device work. I...

[SPMD][PoC] compile & execute with PjRt

> This pr is still too big and hard to review. I would argue that we don't have enough test to confidently land this. It is OK to make an...

[SPMD][PoC] compile & execute with PjRt

> you need to update this pr a bit to account for new tf update, should not be too much code changes through Do we have a reference PR in...

[SPMD][PoC] compile & execute with PjRt

@JackCaoG I addressed the comments, thanks again for the review. I was able to build locally and test.

[SPMD][PoC] compile & execute with PjRt

> Seems like test strict out crashed. @yeounoh were you able to run a simple pytorch/xla program locally? Thanks @JackCaoG , the tests were green at some point (or not??)....

[SPMD][PoC] compile & execute with PjRt

CPU test passes, but the GPU fails with the following somewhat unrelated (at least on the outset) error: ``` *** Begin stack trace *** tsl::CurrentStackTrace[abi:cxx11]() gsignal abort xla::XrtLocalService::XrtLocalService(std::__cxx11::basic_string const&, std::__cxx11::basic_string...

[SPMD][PoC] compile & execute with PjRt

> Seesm irrelevant, let me just restart the gpu ci Yea, this [one](https://app.circleci.com/pipelines/github/pytorch/xla/13795/workflows/7206b828-64a7-4a6c-8307-cee3a8b54e06) succeeded. Thanks @JackCaoG