Tristan Rice

Results 138 issues of Tristan Rice

## Description As part of the future experiment tracking we want to be able to have the application know it's own identity. When we launch a job we return the...

enhancement
module: runner
tracking

## Description Currently we require the ray dashboard_address for communicating with the ray head node. Since runopts are only available during scheduling we embed it into the app_id as a...

enhancement
ray

## Description Add a `torchx list` and `Runner/Scheduler.list` methods. This would allow listing all jobs the user has launched and see their status when tracking multiple different jobs. ## Motivation/Background...

enhancement
module: runner
cli

Summary: This adds a new `metadata_urls` to AppStatus and `metadata` to DescribeAppResponse. Metadata URLs are automatically returned from metadata values if they start with `http://` or `https://`. This isn't quite...

CLA Signed
fb-exported

## 📚 Documentation ## Link https://pytorch.org/torchx/main/runner.config.html Context: https://fb.workplace.com/groups/140700188041197/posts/326515519459662/?comment_id=328106399300574&reply_comment_id=328113552633192 ## What does it currently say? ``` The CLI only picks up .torchxconfig files from the current-working-directory (CWD) so chose a directory...

documentation

## 🐛 Bug Module (check all that applies): * [x ] 'torchx.workspace` ## To Reproduce For large projects users may only care about a subset of the files. This means...

bug
docker

## Description We currently have TypedDict typings for all of the scheduler configs. We should update AppDryRunInfo to use that instead of the generic Mapping[str, CfgVal] types. This should help...

enhancement
module: specs

## Description When users are launching TorchX jobs they often want a way to provide links to view the results on external services. This commonly includes things like Tensorboard. We...

enhancement
module: runner

It would be nice to have Azure support so we can directly launch jobs on Azure's ML training service. Example scheduler: AWS Batch https://github.com/pytorch/torchx/blob/main/torchx/schedulers/aws_batch_scheduler.py Scheduler documentation: https://pytorch.org/torchx/main/schedulers Azure Docs: *...

enhancement
module: runner
scheduler-request

This adds in TPU support when launching jobs via the `kubernetes_scheduler` on GKE. * Volcano doesn't understand the Kubernetes device plugin so when launching we need to set `minAvailable: 0`...

CLA Signed