Ben Wilson comments

Results 10 comments of


                                            Ben Wilson

Enable lazy tensor loading for sharded tensors

Initial design concept: Currently, if an input tensor has data, `XlaMarkSharding` will extract this to a `cpu_tensor`, and then will use `CreateTensorsData` along with the provided sharding spec to load...

Enable lazy tensor loading for sharded tensors

Possible issue with the above design: As noted above, we only should transfer the data to device once. When execution is requested we need to initiate and complete the transfer,...

Enable lazy tensor loading for sharded tensors

On further investigation, we might not need (or want) to use a new IR value after all. An `XLATensor::Data` can be constructed by: * A `BackendDataPtr`, representing either a placeholder...

Enable lazy tensor loading for sharded tensors

The base class for `torch::lazy::LazyGraphExecutor` accepts an argument for `sync_ltc_data`, but the torch_xla implementation of `XlaGraphExecutor` makes use of two arguments: `sync_ltc_data` and `warm_up_cache_only`. The possible combination of arguments are:...

Enable lazy tensor loading for sharded tensors

> What would happen if data is not synced to device and you try to execute the graph anyway? The lazy aten operations are defined using `XlaNodes`, which are always...

Enable lazy tensor loading for sharded tensors

What this implies for the design here: * We can add an API option to allow `mark_sharding` to be called with `lazy=True` from Python, and pass this through to `XlaMarkSharding`...

Enable lazy tensor loading for sharded tensors

Complication: `XlaMarkSharding` is an in-place operation. The input is a `const at::Tensor&` which is unwrapped into an `XLATensorPtr` that is modified inplace. There is a setter for `XLATensor::SetTensor` which is...

Enable lazy tensor loading for sharded tensors

Prior complication is mostly a non-issue; the intended API is for `torch_xla.runtime.use_spmd()` to be called before any XLA tensors are initialized, with an existing warning about this. If users try...

Enable lazy tensor loading for sharded tensors

Follow-up to my prior comment: > I'm not sure if this suits AWS's requirements. If the tracing produces new IR nodes, then even if loading is delayed in mark_sharding, it...

Enable lazy tensor loading for sharded tensors

Supporting meta initialization of the IR graph introduces a new problem; un-executable IR graphs. The current implementation enforces that all `DeviceData` nodes are backed by real data (as `PjRtBuffer`s). The...