jonb377 issues

Repositories
Issues
Comments

Results 6 issues of


                                            jonb377

Use PjRt GPU client

* Add support for single-host single-GPU * Add a simple unit test

runtime

Add optimizer priming for dist chkpt

See also: https://github.com/pytorch/xla/issues/6546 The optimizer state must be primed before it can be restored. Optimizer state isn't materialized until the first `optim.step` call, so to restore optimizer state before resuming...

Allow debug_dump_gcs to be specified with other XLA_FLAGS

## Fixes / Features - `debug-dump-gcs` doesn't need to be exclusive with environment-specified `XLA_FLAGS`. ## Testing / Documentation Testing details: - `xpk workload create ... --debug-dump-gcs gs://foo/bar --env XLA_FLAGS=--xla_dump_to=/foo/bar` =>...

Initialize jax distributed when checkpointing is enabled

Nightly tests are failing due to jax.distributed not being initialized in the synchronous checkpointing case.

jonb377

Use PjRt GPU client

fill_.Tensor

fill_.Scalar

Add optimizer priming for dist chkpt

Allow debug_dump_gcs to be specified with other XLA_FLAGS

Initialize jax distributed when checkpointing is enabled