Victor Barr
Victor Barr
Improvements: 1. Use enum for accelerator type 2. remove usage of device type based on --tpu-type / --device-type check everywhere. Do this in one place. 3. Remove h100 device specific...
## Fixes / Features - Supports tpu-topology flag for specifying custom topologys for TPUs - ## Testing / Documentation Added a check to make sure the format of topologies fits...
xpk currently supports one cluster queue / local queue. If an xpk cluster administrator wants to split capacity between different use cases, they would currently have to create separate xpk...
If this is possible, we could create an automatic release to pip as part of github workflow. This would be great to connect to the pipeline to release. PR ->...
## Fixes / Features - Corrects v5litepod terminology to v5e. ## Testing / Documentation Got the error when creating cluster with v5litepod. Was able to continue in cluster creation with...
…eation / deletion ## Fixes / Features - - ## Testing / Documentation Testing details. - [ y/n ] Tests pass - [ y/n ] Appropriate changes to documentation are...
allow_split_physical_axes is only supported for device meshes atm but we also should support this for hybrid meshes. This is useful when we want to use FSDP across DCN and ICI...