multi-cluster-app-dispatcher icon indicating copy to clipboard operation
multi-cluster-app-dispatcher copied to clipboard

[core] Cannot create identical appwrappers in different namespaces

Open kpouget opened this issue 1 year ago • 5 comments

When I create an AppWrapper in one namespace, it works as expected. But when I try to create the same AppWrappers in another namespace (with all the resource namespaces properly updated), the AppWrapper turns into the Failed state.

kpouget avatar Jun 26 '23 13:06 kpouget

Thanks for reporting, I think MCAD needs a --generate name or internally needs to generate unique name

asm582 avatar Jun 26 '23 13:06 asm582

Quick and dirty reproducer:

#! /bin/bash

set -x

APPWRAPPER="
apiVersion: mcad.ibm.com/v1beta1
kind: AppWrapper
metadata:
  name: aw000-000-30s
  namespace: default
spec:
  priority: 10
  resources:
    GenericItems:
    - completionstatus: Complete
      custompodresources:
      - limits:
          cpu: 100m
          memory: 100Mi
        replicas: 1
        requests:
          cpu: 100m
          memory: 100Mi
      generictemplate:
        apiVersion: batch/v1
        kind: Job
        metadata:
          name: aw000-000-30s-job
          namespace: default
        spec:
          activeDeadlineSeconds: 18000
          template:
            metadata:
              name: aw000-000-30s-job-pod
            spec:
              containers:
              - args:
                - sleep \$RUNTIME
                command:
                - bash
                - -c
                env:
                - name: RUNTIME
                  value: '30'
                image: registry.access.redhat.com/ubi8/ubi
                name: main
                resources:
                  limits:
                    cpu: 100m
                    memory: 100Mi
                  requests:
                    cpu: 100m
                    memory: 100Mi
              restartPolicy: Never
      replicas: 4
    Items: []
"

oc delete ns test1 test2 --ignore-not-found

oc create ns test1
oc create ns test2

echo "$APPWRAPPER" | sed 's/namespace: default/namespace: test1/'  | oc apply -f-
echo "$APPWRAPPER" | sed 's/namespace: default/namespace: test2/'  | oc apply -f-

while true; do
    if oc get appwrapper.mcad.ibm.com/aw000-000-30s -n test1 -ojsonpath={.status.state} | grep Failed; then
        oc get appwrapper.mcad.ibm.com/aw000-000-30s -n test1 -oyaml
        break
    fi
    if oc get appwrapper.mcad.ibm.com/aw000-000-30s -n test2 -ojsonpath={.status.state} | grep Failed; then
        oc get appwrapper.mcad.ibm.com/aw000-000-30s -n test2 -oyaml
        break
    fi
done

Expected output (relevant part):

status:
  canrun: true
  conditions:
  - lastTransitionMicroTime: "2023-07-27T09:11:55.530986Z"
    lastUpdateMicroTime: "2023-07-27T09:11:55.530986Z"
    status: "True"
    type: Init
  - lastTransitionMicroTime: "2023-07-27T09:11:55.569199Z"
    lastUpdateMicroTime: "2023-07-27T09:11:55.569199Z"
    reason: AwaitingHeadOfLine
    status: "True"
    type: Queueing
  - lastTransitionMicroTime: "2023-07-27T09:11:55.574809Z"
    lastUpdateMicroTime: "2023-07-27T09:11:55.574809Z"
    reason: FrontOfQueue.
    status: "True"
    type: HeadOfLine
  - lastTransitionMicroTime: "2023-07-27T09:11:58.182186Z"
    lastUpdateMicroTime: "2023-07-27T09:11:58.182185Z"
    message: 'test2/aw000-000-30s creation failure: jobs.batch "aw000-000-30s-job"
      not found'
    reason: ItemCreationFailure.
    status: "True"
    type: Failed
  controllerfirsttimestamp: "2023-07-27T09:11:55.530986Z"
  filterignore: true
  queuejobstate: Failed
  sender: before manageQueueJob - afterEtcdDispatching
  state: Failed
  systempriority: 10

kpouget avatar Jul 27 '23 09:07 kpouget

This is the same issue as #383

metalcycling avatar Jul 27 '23 12:07 metalcycling

I'm confused on what is causing this when https://github.com/project-codeflare/multi-cluster-app-dispatcher/blob/44696cfcc692db8a47386bca5aba4ff4b81d3217/pkg/controller/queuejob/queuejob_controller_ex.go#L145 exists. It seems like internally it's referenced by only its name somewhere.

KPostOffice avatar Aug 24 '23 20:08 KPostOffice

@Fiona-Waters and I faced a similar issue, this time when the AppWrapper contains an item, that has the same name as an existing resource of the same kind in another namespace.

For example:

  1. create a Job in namespace A
  2. then create an AppWrapper, with the same Job as wrapped item, in namespace B
  3. The Job corresponding to the AppWrapper in namespace B is never created, despite the AppWrapper has been dispatched.

/cc @ChristianZaccaria

astefanutti avatar Sep 13 '23 12:09 astefanutti