Tristan Rice issues

Results 138 issues of


                                            Tristan Rice

add a TORCHX_JOB_ID environment variable to all jobs launched via runner

## Description As part of the future experiment tracking we want to be able to have the application know it's own identity. When we launch a job we return the...

enhancement

module: runner

tracking

eliminate ray dashboard address from runopts

## Description Currently we require the ray dashboard_address for communicating with the ray head node. Since runopts are only available during scheduling we embed it into the app_id as a...

enhancement

ray

add `torchx list` command and `Runner.list` APIs

## Description Add a `torchx list` and `Runner/Scheduler.list` methods. This would allow listing all jobs the user has launched and see their status when tracking multiple different jobs. ## Motivation/Background...

enhancement

module: runner

cli

torchx: add metadata_urls to allow printing user links from the CLI

Summary: This adds a new `metadata_urls` to AppStatus and `metadata` to DescribeAppResponse. Metadata URLs are automatically returned from metadata values if they start with `http://` or `https://`. This isn't quite...

CLA Signed

fb-exported

Document .torchxconfig behavior in home directory

## 📚 Documentation ## Link https://pytorch.org/torchx/main/runner.config.html Context: https://fb.workplace.com/groups/140700188041197/posts/326515519459662/?comment_id=328106399300574&reply_comment_id=328113552633192 ## What does it currently say? ``` The CLI only picks up .torchxconfig files from the current-working-directory (CWD) so chose a directory...

documentation

DockerWorkspace: building large projects can be slower than ideal

## 🐛 Bug Module (check all that applies): * [x ] 'torchx.workspace` ## To Reproduce For large projects users may only care about a subset of the files. This means...

bug

docker

specs: type AppDryRunInfo _cfg

## Description We currently have TypedDict typings for all of the scheduler configs. We should update AppDryRunInfo to use that instead of the generic Mapping[str, CfgVal] types. This should help...

enhancement

module: specs

job launch hooks for linking to external services such as tensorboard

## Description When users are launching TorchX jobs they often want a way to provide links to view the results on external services. This commonly includes things like Tensorboard. We...

enhancement

module: runner

Add Azure Scheduler Support

It would be nice to have Azure support so we can directly launch jobs on Azure's ML training service. Example scheduler: AWS Batch https://github.com/pytorch/torchx/blob/main/torchx/schedulers/aws_batch_scheduler.py Scheduler documentation: https://pytorch.org/torchx/main/schedulers Azure Docs: *...

enhancement

module: runner

scheduler-request

torchx/specs,schedulers: add TPU named resources + support in kubernetes_scheduler

This adds in TPU support when launching jobs via the `kubernetes_scheduler` on GKE. * Volcano doesn't understand the Kubernetes device plugin so when launching we need to set `minAvailable: 0`...

CLA Signed