torchx icon indicating copy to clipboard operation
torchx copied to clipboard

Kubernetes backend: Can't run hello world

Open knexator opened this issue 11 months ago • 0 comments

❓ Questions and Help

Please note that this issue tracker is not a help form and this issue will be closed.

Before submitting, please ensure you have gone through our documentation.

Question

Both of these work correctly: uv run torchx run --scheduler local_cwd -cfg queue=default --workspace="" utils.echo uv run torchx run --scheduler local_docker -cfg queue=default --workspace="" utils.echo However, with the kubernetes backend I get:

torchx 2025-01-20 12:35:27 INFO     Launched app: kubernetes://torchx/default:echo-rlkrc4nxcxzxs
torchx 2025-01-20 12:35:27 INFO     AppStatus:
    State: UNKNOWN
    Num Restarts: -1
    Roles:
    Msg: <NONE>
    Structured Error Msg: <NONE>
    UI URL: None

I get the same UNKNOWN by doing torchx status kubernetes://torchx/default:echo-rlkrc4nxcxzxs

I have run pip install torchx[kubernetes] and done kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/v1.6.0/installer/volcano-development.yaml

I am using a local kind cluster, which is otherwise working correctly.

knexator avatar Jan 20 '25 11:01 knexator