k8s-argus icon indicating copy to clipboard operation
k8s-argus copied to clipboard

TRANSIENT_FAILURE errors on Argus start

Open ddeasey opened this issue 8 years ago • 1 comments

  • Argus Version: 0.2.0-alpha.0
  • Kubernetes Version: 1.7.6
  • Docker Version:

Argus pod in an 'Error' state after helm deploy. 'TRANSIENT_FAILURE' message in the logs.

$ kubectl get pod --all-namespaces
NAMESPACE     NAME                                       READY     STATUS             RESTARTS   AGE
kube-system   argus-4248942165-9vgdx                     0/1       CrashLoopBackOff   7          15m
kube-system   collectorset-controller-4255353268-kwbgh   1/1       Running            0          15m
kube-system   kube-dns-2834558388-3h7m6                  3/3       Running            0          43d
kube-system   tiller-deploy-3341511835-rsksq             1/1       Running            0          5d

argus pod logs:

$ kubectl logs argus-4248942165-9vgdx -n kube-system
time="2017-11-15T21:05:00Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:01Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:01Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:02Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:02Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:05Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:05Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:08Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:08Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:10Z" level=fatal msg="Failed waiting for gRPC to ready, state is \"TRANSIENT_FAILURE\""

no container logs visible

$ kubectl logs collectorset-controller-4255353268-kwbgh -n kube-system
$

Helm deploy command:

helm upgrade --debug --install --version 0.2.0 --namespace kube-system \
                            --set collectorset-controller.imageTag=0.1.0-alpha.0 \
                            --set global.accessID=‘REDACTED’ \
                            --set global.accessKey='REDACTED' \
                            --set global.account='REDACTED' \
                            --set enableRBAC=false  \
                            --set collectorset-controller.enableRBAC=false \
                            --set clusterName='REDACTED' \
                            --set imageTag=0.2.0-alpha.0 \
                            argus logicmonitor/argus

ddeasey avatar Nov 15 '17 21:11 ddeasey

@ddeasey Looks like a network error caused the gRPC connection to fail. However, I do think a better error message could be implemented. Going to leave open for now.

andrewrynhard avatar Nov 21 '17 19:11 andrewrynhard