Pedro Goncalves Mokarzel issues

Results 15 issues of


                                            Pedro Goncalves Mokarzel

XPK install broken due to dependency

I am following the steps in https://github.com/AI-Hypercomputer/xpk?tab=readme-ov-file#installation after following the [pre-requisites](https://github.com/AI-Hypercomputer/xpk?tab=readme-ov-file#installation). Running `pip install xpk` will get the error: ``` ERROR: pip's dependency resolver does not currently take into account...

Enable lazy tensor loading for sharded tensors

Currently in PyTorchXLA when a tensor is initialized through sharding, it is loaded into its related devices immediately. From a logic point, `mark_sharding` is acting similarly to how calling `.to('xla')`...

enhancement

distributed

Leverage torch_xla::OpSharding in mark_sharding

Currently [`get_op_sharding`](https://github.com/pytorch/xla/blob/r2.7/torch_xla/distributed/spmd/xla_sharding.py#L116) generates an `xla::OpSharding`. With the new abstraction of `torch_xla::OpSharding`, we will want to use it instead

enhancement

distributed

Transition torch_xla::ShardingSec to torch_xla::OpSharding

This is primarily for the sake of documentation and consistency.

distributed

documentation

Enable Local SPMD through torch_xla::OpSharding

Once we have refactored mark_sharding to utilize torch_xla::OpSharding, we will leverage it to implement Local SPMD. Through it we will store the correct global device association, and pass it to...

enhancement

distributed

Create Local SPMD

EDIT: Rather than creating an new RFC, I have decided to expand this GitHub issue with more information on achieving Local SPMD ## Context Previous work has been done to...

enhancement

distributed

Have scripts/update_deps.py update bazel pin

Currently `scripts/update_deps.py` does not update bazel version. We should consider changing this to make sure pin updates done weekly also update bazel.

dependencies

build

Migrate to Bzlmod

WORKSPACE is being migrated to Bzlmod (see the [migration bazel article](https://bazel.build/external/migration)). The WORKSPACE file is already disabled in Bazel 8 (late 2024) and will be removed in Bazel 9 (late...

dependencies

build

Set ShardingSpec standard, and generalize it

As part of the "xla::OpSharding", I found two instances where it was actually being abstracted within PytorchXLA: - tensor_common.h: torch_xla::ShardingSpec - tensor.h/cpp: ShardingSpec Both instances do basically the same thing,...

enhancement

distributed

usability

Simplify device count external API calls

Currently there are many external APIs related getting the number of devices associate with PyTorch XLA. Those that I could find were: - "global_runtime_device_count": returns the total number of devices...

usability

documentation