torchx
torchx copied to clipboard
add a TORCHX_JOB_ID environment variable to all jobs launched via runner
Description
As part of the future experiment tracking we want to be able to have the application know it's own identity. When we launch a job we return the full job id (i.e. kubernetes://session/app_id
) but the app itself doesn't have this exact same job ID. We do provide an app_id
macro that can be used in the app def for both env and arguments but it's up to the app owner to manually add that.
Motivation/Background
If we add a TORCHX_JOB_ID
environment variable it allows us to write more standardized integrations for experiment tracking that use the job ID as a key. There's no added cost from an extra environment variable and will enable deeper automatic integrations into other libraries.
Detailed Proposal
Add a new environment variable to Runner.dryrun
https://github.com/pytorch/torchx/blob/main/torchx/runner/api.py#L241
that uses the macros.app_id to add the full job ID using the scheduler and session information form the runner.
https://github.com/pytorch/torchx/blob/main/torchx/specs/api.py#L156