torchx icon indicating copy to clipboard operation
torchx copied to clipboard

TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.

Results 135 torchx issues
Sort by recently updated
recently updated
newest added

See https://github.com/pytorch/torchx/actions/workflows/components-integration-tests.yaml

See: https://github.com/pytorch/torchx/actions/workflows/kfp-integration-tests.yaml

## Description I’m currently working with TorchX in conjunction with Volcano scheduling for my training jobs on an Amazon EKS cluster. I’ve also integrated Karpenter autoscaler for effective node scaling....

## ❓ Questions and Help ### Question Hi, could anyone provide the script to run pytorch ddp training on IBM LSF?

## 🐛 Bug Module (check all that applies): * [ ] `torchx.spec` * [ ] `torchx.component` * [ ] `torchx.apps` * [ ] `torchx.runtime` * [x] `torchx.cli` * [ ]...

## 📚 Documentation ## Link [https://pytorch.org/torchx/latest/components/distributed.html](https://pytorch.org/torchx/latest/components/distributed.html) ## What does it currently say? Not clear whether --cpu, --gpu arguments are overrided by -j arguments, although in my testing (launch then run...

## Description Add support for [Hashicorp Nomad](https://www.nomadproject.io/) as a scheduler. ## Motivation/Background Nomad has a good scheduler, and pytorch has good distributed training. However, Nomad launches batch job tasks asynchronously...

## Description Switch static type checker to mypy and include mypy compatible type stubs (PEP 561 compliant) by adding a `py.typed` file at the root of `torchx` module (see https://mypy.readthedocs.io/en/stable/installed_packages.html#creating-pep-561-compatible-packages)....

This adds a new `runopts.from_typed_dict` method and uses it to generate the runopts from the typed dict field, annotations, default parameters and docstring. This simplifies adding new fields to schedulers...

CLA Signed

## 🐛 Bug Module (check all that applies): * [ ] `torchx.spec` * [ ] `torchx.component` * [ ] `torchx.apps` * [ ] `torchx.runtime` * [ ] `torchx.cli` * [...

bug
kubernetes