Unnecessary hard check validation edge case for queue
What happened?
In Kubeflow-traning SDK version 1.9.3
If 'queue-name' parameter of TrainingClient's create_job() function provides 'local-queue' name which doesn't exist in given test namespace -
The user gets warning mentioned below but queue label addition gets skipped due to this below hard validation check : https://github.com/kubeflow/trainer/blob/52e41d5d124be36d7eb9754e47f7b21efe67b1ac/sdk/python/kubeflow/training/api/training_client.py#L603
For example :
import os
tc.create_job(
name="test-pytorch",
namespace="<test-namespace>",
train_func=<train_func>,
num_workers=1,
queue_name="<non-existant-queue>",
)
Warning : Queue '<non-existant-queue>' does not exist in namespace '<test-namespace>'. The job will be created but may not be managed by Kueue.
PytorchJob created without queue-label
What did you expect to happen?
The label addition should be allowed even if local-queue doesn't exist with above mentioned warning.
Expected Created PytorchJob via create_job() with non-existant-queue :
apiVersion: kubeflow.org/v1
kind: PyTorchJob
metadata:
labels:
kueue.x-k8s.io/queue-name: <non-existant-queue>
Workaround :
Create Job with labels parameter :
import os
tc.create_job(
name="test-pytorch",
namespace="<test-namespace>",
train_func=<train_func>,
num_workers=1,
labels={
"kueue.x-k8s.io/queue-name": "<non-existant-queue>",
}
)
Environment
PytorchJob created in a K8s cluster
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
/reopen
@tenzen-y: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Hi! I’d like to work on this issue if it’s still available.
I’m starting to investigate and will share progress soon.