multi-cluster-app-dispatcher
multi-cluster-app-dispatcher copied to clipboard
[core] Cannot create identical appwrappers in different namespaces
When I create an AppWrapper in one namespace, it works as expected. But when I try to create the same AppWrappers in another namespace (with all the resource namespaces properly updated), the AppWrapper turns into the Failed
state.
-
Source AppWrappers aw000-000-30s.yaml.log
-
MCAD logs: mcad.log
-
Failed AppWrappers appwrappers.yaml.log
Thanks for reporting, I think MCAD needs a --generate name
or internally needs to generate unique name
Quick and dirty reproducer:
#! /bin/bash
set -x
APPWRAPPER="
apiVersion: mcad.ibm.com/v1beta1
kind: AppWrapper
metadata:
name: aw000-000-30s
namespace: default
spec:
priority: 10
resources:
GenericItems:
- completionstatus: Complete
custompodresources:
- limits:
cpu: 100m
memory: 100Mi
replicas: 1
requests:
cpu: 100m
memory: 100Mi
generictemplate:
apiVersion: batch/v1
kind: Job
metadata:
name: aw000-000-30s-job
namespace: default
spec:
activeDeadlineSeconds: 18000
template:
metadata:
name: aw000-000-30s-job-pod
spec:
containers:
- args:
- sleep \$RUNTIME
command:
- bash
- -c
env:
- name: RUNTIME
value: '30'
image: registry.access.redhat.com/ubi8/ubi
name: main
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 100m
memory: 100Mi
restartPolicy: Never
replicas: 4
Items: []
"
oc delete ns test1 test2 --ignore-not-found
oc create ns test1
oc create ns test2
echo "$APPWRAPPER" | sed 's/namespace: default/namespace: test1/' | oc apply -f-
echo "$APPWRAPPER" | sed 's/namespace: default/namespace: test2/' | oc apply -f-
while true; do
if oc get appwrapper.mcad.ibm.com/aw000-000-30s -n test1 -ojsonpath={.status.state} | grep Failed; then
oc get appwrapper.mcad.ibm.com/aw000-000-30s -n test1 -oyaml
break
fi
if oc get appwrapper.mcad.ibm.com/aw000-000-30s -n test2 -ojsonpath={.status.state} | grep Failed; then
oc get appwrapper.mcad.ibm.com/aw000-000-30s -n test2 -oyaml
break
fi
done
Expected output (relevant part):
status:
canrun: true
conditions:
- lastTransitionMicroTime: "2023-07-27T09:11:55.530986Z"
lastUpdateMicroTime: "2023-07-27T09:11:55.530986Z"
status: "True"
type: Init
- lastTransitionMicroTime: "2023-07-27T09:11:55.569199Z"
lastUpdateMicroTime: "2023-07-27T09:11:55.569199Z"
reason: AwaitingHeadOfLine
status: "True"
type: Queueing
- lastTransitionMicroTime: "2023-07-27T09:11:55.574809Z"
lastUpdateMicroTime: "2023-07-27T09:11:55.574809Z"
reason: FrontOfQueue.
status: "True"
type: HeadOfLine
- lastTransitionMicroTime: "2023-07-27T09:11:58.182186Z"
lastUpdateMicroTime: "2023-07-27T09:11:58.182185Z"
message: 'test2/aw000-000-30s creation failure: jobs.batch "aw000-000-30s-job"
not found'
reason: ItemCreationFailure.
status: "True"
type: Failed
controllerfirsttimestamp: "2023-07-27T09:11:55.530986Z"
filterignore: true
queuejobstate: Failed
sender: before manageQueueJob - afterEtcdDispatching
state: Failed
systempriority: 10
This is the same issue as #383
I'm confused on what is causing this when https://github.com/project-codeflare/multi-cluster-app-dispatcher/blob/44696cfcc692db8a47386bca5aba4ff4b81d3217/pkg/controller/queuejob/queuejob_controller_ex.go#L145 exists. It seems like internally it's referenced by only its name somewhere.
@Fiona-Waters and I faced a similar issue, this time when the AppWrapper contains an item, that has the same name as an existing resource of the same kind in another namespace.
For example:
- create a Job in namespace A
- then create an AppWrapper, with the same Job as wrapped item, in namespace B
- The Job corresponding to the AppWrapper in namespace B is never created, despite the AppWrapper has been dispatched.
/cc @ChristianZaccaria