yunikorn-k8shim icon indicating copy to clipboard operation
yunikorn-k8shim copied to clipboard

[YUNIKORN-2504] Support canonical labels for queue/applicationId in scheduler

Open chenyulin0719 opened this issue 7 months ago • 1 comments

What is this PR for?

Support canonical Queue/ApplicationId labels in Pod, allows it coexist with the existing metadata.

  • yunikorn.apache.org/app-id (New, Canonical Label)
  • yunikorn.apache.org/queue (New, Canonical Label)

YuniKorn will reject those pods with conflicting metadata after version 1.7.0.

  • Check metadata consistency before move task state from 'New' to 'Pending'. Run the pod metadata check in task.sanityCheckBeforeScheduling()
  • Before 1.7.0, If sanity check failed due to inconsistent metadata, then log a warning message
  • After 1.7.0, If sanity check failed due to inconsistent metadata, move the task from 'New' to 'Rejected' state. And fail the pod with reasons.

ApplicationID is fetched from pod in below order:

  1. Label: constants.CanonicalLabelApplicationID (New)
  2. Annotation: constants.AnnotationApplicationID
  3. Label: constants.LabelApplicationID
  4. Label: constants.SparkLabelAppID

Queue name is fetched from pod in below ortder

  1. Label: constants.CanonicalLabelQueueName (New)
  2. Annotation: constants.AnnotationQueueName
  3. Label: constants.LabelQueueName (Previous: constants.LabelQueueName > constants.AnnotationQueueName)
  4. Default: constants.ApplicationDefaultQueue

What type of PR is it?

  • [X] - Feature

Todos

  • Admission Controller should fail the pod request too if the metadata is inconsistent. Will create another Jira once this PR got merged.
  • Update Doc https://yunikorn.apache.org/docs/next/user_guide/labels_and_annotations_in_yunikorn

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-2504

How should this be tested?

Run below simple sleep pods: (Which have canonical metadata for queue/app-id, but have a confilcting annotations.)


apiVersion: v1
kind: Pod
metadata:
  labels:
    app: sleep
    yunikorn.apache.org/app-id: "application-sleep-0001"
    yunikorn.apache.org/queue: "root.sandbox"
  annotations:
    yunikorn.apache.org/queue: "root.sandbox-another"
  name: pod-with-inconsistent-queue
spec:
  schedulerName: yunikorn
  restartPolicy: Never
  containers:
    - name: sleep-6000s
      image: "alpine:latest"
      command: ["sleep", "6000"]
      resources:
        requests:
          cpu: "100m"
          memory: "500M"
          


---
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: sleep
    yunikorn.apache.org/app-id: "application-sleep-0002"
  annotations:
    yunikorn.apache.org/app-id: "application-sleep-0002-another"
  name: pod-with-inconsistent-app-id
spec:
  schedulerName: yunikorn
  restartPolicy: Never
  containers:
    - name: sleep-6000s
      image: "alpine:latest"
      command: ["sleep", "6000"]
      resources:
        requests:
          cpu: "100m"
          memory: "500M"
       

Check the scheduler pod logs:

kubectl logs -l component=yunikorn-scheduler -n yunikorn --tail=200000 > yunikorn-scheduler-logs.txt

You will see the warning logs like this:


2024-07-05T18:26:17.591Z	WARN	shim.cache.task	cache/task.go:582	Task pod has conflicting metadata, the unbound task pod will be rejected after version 1.7.0	{"appID": "application-sleep-00002", "podName": "pod-with-inconsistent-queue", "error": "queue is not consistently set in pod's labels and annotations. [PodInconsistentMetadata]"}

2024-07-05T18:26:17.592Z	WARN	shim.cache.task	cache/task.go:582	Task pod has conflicting metadata, the unbound task pod will be rejected after version 1.7.0	{"appID": "application-sleep-00001-annotation", "podName": "pod-with-inconsistent-app-id", "error": "application ID is not consistently set in pod's labels and annotations. [PodInconsistentMetadata]"}

Screenshots (if appropriate)

Before 1.7.0, only log warning message: image

After 1.7.0, below is the screenshot without admission controller: (Another PR will be submit after version 1.6.0 released.) (This is the original screenshot in the closed draft PR) image

image image

Questions:

NA

chenyulin0719 avatar Jul 05 '24 19:07 chenyulin0719