Alexander Zhipa

Results 35 issues of Alexander Zhipa

See #1037 1037 Test plan: [x] updated unit tests

CLA Signed

## Description Using `replicas` for repetitive pod configuration in `kubernetes_scheduler` has been removed in https://github.com/pytorch/torchx/commit/f6907e8c089208545b95f1b8967278e399006a47 The rationale is [here](https://github.com/pytorch/torchx/blame/41be1d8e97825151482323faabf5cdfcdd00f973/torchx/schedulers/kubernetes_scheduler.py#L378-L380) Unfortunately for a large setup we can easily breach default limits,...

## Description `to_dict` used to parse `env` key-value pairs does not allow using `=` as part of the value. ## Motivation/Background This means we cannot set values for `NCCL_IB_HCA` for...

## Description Currently we can only have the default ones or the custom ones (from one of the packages): https://github.com/pytorch/torchx/blob/main/torchx/schedulers/__init__.py#L59 ## Motivation/Background This will allow adding new schedulers locally without...

## Description TorchX already supports App `metadata`. Unfortunately there's no way to pass `metadata` via torchxconfig or CLI unlike `env`. ## Motivation/Background While implementation is scheduler specific and not all...