fate-operator
fate-operator copied to clipboard
FateCluster Fails reconciliation
Hi,
after deployed Fate operator and apply kubefate and fatecluster Fate operator seems to fail during reconciliation phase maintaining fatecluster crd in status creating
.
here request that fails
Here a error log from controller side
2021-03-24T07:44:49.001Z DEBUG controllers.FateCluster request info {"url": "http://kubefate-kubefate-kubefate-sample.kube-fate:8080/v1/cluster/8e4c85be-4428-4f51-a55d-bac3db91816c", "type": "GET", "body": ""}
and here from service side
2021/03/24 07:52:25 /workspace/pkg/modules/cluster_db.go:135 record not found [0.611ms] [rows:0] SELECT * FROM
clustersWHERE uuid = '8e4c85be-4428-4f51-a55d-bac3db91816c' AND
clusters.
deleted_atIS NULL ORDER BY
clusters.
id LIMIT 1 2021-03-24T07:52:25Z ERR workspace/pkg/api/cluster.go:152 > get cluster error error="record not found" uuid=8e4c85be-4428-4f51-a55d-bac3db91816c 2021-03-24T07:52:25Z ERR usr/local/go/src/net/http/server.go:1919 > Request ip=10.244.0.5 latency=1.1971 method=GET path=/v1/cluster/8e4c85be-4428-4f51-a55d-bac3db91816c status=500 user-agent=Go-http-client/1.1
Do I need to run some init actions before use examples ?
Thanks
It seems the FATE cluster is deploying, and the log from controller is a debug message. Can everything works after the FATE crd created? Or can we describe the pod status of FATE cluster and see if any error there?
The problem seems that crd stay stuck. After applied ./config/samples/app_v1beta1_fatecluster.yaml
crd remain in status creating
cause probably controller can't close reconcile ?
Thanks for help
Hi there @LaynePeng did you managed to investigate ?
Hi there @LaynePeng did you managed to investigate ?
We still cannot reproduce this problem? Any other tips can be found in logs? @owlet42 Have you any idea on this problem?
It may be an accident, if there is more log information, maybe it can be solved.
Hi there, I haven't many logs than you see above but I can reproduce problem easily with
cat clusterconfig-1.18.yaml << EOF > clusterconfig-1.18.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/node:v1.18.8
extraPortMappings:
- containerPort: 31080
hostPort: 80
- containerPort: 31443
hostPort: 443
EOF
kind create cluster --config clusterconfig-1.18.yaml --name fate-operator
from fate-operator root folder
export IMG=federatedai/fate-controller:bc5420bbe25
make docker-build-without-test
kind load docker-image federatedai/fate-controller:bc5420bbe25 --name fate-operator
make deploy
k apply -f config/samples/rbac-config.yaml
k apply -f config/samples/kubefate-secret.yaml
k create ns fate-9999
k create -f ./config/samples/app_v1beta1_kubefate.yaml
k get pods -n kube-fate
kubectl create -f ./config/samples/app_v1beta1_fatecluster.yaml
kubectl get fatecluster -A
kubectl get fatecluster -A
fate-9999 fatecluster-sample 9999 Creating
k logs fate-operator-controller-manager-86b58ffc9b-666sh manage -n fate-operator-system
021-04-11T11:02:36.886Z DEBUG controllers.FateCluster retry {"retry": 3}
2021-04-11T11:02:36.887Z DEBUG controllers.FateCluster request info {"url": "http://kubefate-kubefate-kubefate-sample.kube-fate:8080/v1/cluster/562481ab-6c84-4279-888a-ff81b5e7e965", "type": "GET", "body": ""}
2021-04-11T11:02:37.641Z DEBUG controllers.FateCluster request code {"Type": "GET", "Path": "cluster/562481ab-6c84-4279-888a-ff81b5e7e965", "respCode": 500, "respBody": "{\"error\":\"record not found\"}"}
Hi @owlet42 did you managed to investigate ?
Any new update about this issue?
Hi @LaynePeng not from my side. I haven't see any relevant commit Do I need to replicate test ?
Any new update about this issue?