torchx icon indicating copy to clipboard operation
torchx copied to clipboard

add a TORCHX_JOB_ID environment variable to all jobs launched via runner

Open d4l3k opened this issue 2 years ago • 0 comments

Description

As part of the future experiment tracking we want to be able to have the application know it's own identity. When we launch a job we return the full job id (i.e. kubernetes://session/app_id) but the app itself doesn't have this exact same job ID. We do provide an app_id macro that can be used in the app def for both env and arguments but it's up to the app owner to manually add that.

Motivation/Background

If we add a TORCHX_JOB_ID environment variable it allows us to write more standardized integrations for experiment tracking that use the job ID as a key. There's no added cost from an extra environment variable and will enable deeper automatic integrations into other libraries.

Detailed Proposal

Add a new environment variable to Runner.dryrun

https://github.com/pytorch/torchx/blob/main/torchx/runner/api.py#L241

that uses the macros.app_id to add the full job ID using the scheduler and session information form the runner.

https://github.com/pytorch/torchx/blob/main/torchx/specs/api.py#L156

Alternatives

Additional context/links

d4l3k avatar Jul 22 '22 18:07 d4l3k