Tristan Rice
Tristan Rice
## Description As part of the future experiment tracking we want to be able to have the application know it's own identity. When we launch a job we return the...
## Description Currently we require the ray dashboard_address for communicating with the ray head node. Since runopts are only available during scheduling we embed it into the app_id as a...
## Description Add a `torchx list` and `Runner/Scheduler.list` methods. This would allow listing all jobs the user has launched and see their status when tracking multiple different jobs. ## Motivation/Background...
Summary: This adds a new `metadata_urls` to AppStatus and `metadata` to DescribeAppResponse. Metadata URLs are automatically returned from metadata values if they start with `http://` or `https://`. This isn't quite...
## 📚 Documentation ## Link https://pytorch.org/torchx/main/runner.config.html Context: https://fb.workplace.com/groups/140700188041197/posts/326515519459662/?comment_id=328106399300574&reply_comment_id=328113552633192 ## What does it currently say? ``` The CLI only picks up .torchxconfig files from the current-working-directory (CWD) so chose a directory...
## 🐛 Bug Module (check all that applies): * [x ] 'torchx.workspace` ## To Reproduce For large projects users may only care about a subset of the files. This means...
## Description We currently have TypedDict typings for all of the scheduler configs. We should update AppDryRunInfo to use that instead of the generic Mapping[str, CfgVal] types. This should help...
## Description When users are launching TorchX jobs they often want a way to provide links to view the results on external services. This commonly includes things like Tensorboard. We...
It would be nice to have Azure support so we can directly launch jobs on Azure's ML training service. Example scheduler: AWS Batch https://github.com/pytorch/torchx/blob/main/torchx/schedulers/aws_batch_scheduler.py Scheduler documentation: https://pytorch.org/torchx/main/schedulers Azure Docs: *...
This adds in TPU support when launching jobs via the `kubernetes_scheduler` on GKE. * Volcano doesn't understand the Kubernetes device plugin so when launching we need to set `minAvailable: 0`...