kepler-model-server icon indicating copy to clipboard operation
kepler-model-server copied to clipboard

timely CI error due to DNS failed to resolve the service

Open sunya-ch opened this issue 10 months ago • 0 comments

What happened?

We found CI error failed from time to time (rerun for multiple times helps it pass)

error: connection error: Post "http://kepler-model-server.kepler.svc.cluster.local:8100/model": dial tcp: lookup kepler-model-server.kepler.svc.cluster.local on 10.96.0.10:53: no such host (http://kepler-model-server.kepler.svc.cluster.local:8100/model))
Error from server (InternalError): error when creating "tasks/train-task.yaml": Internal error occurred: failed calling webhook "webhook.pipeline.tekton.dev": failed to call webhook: Post "[https://tekton-pipelines-webhook.tekton-pipelines.svc:443/defaulting?timeout=10s](https://tekton-pipelines-webhook.tekton-pipelines.svc/defaulting?timeout=10s)": dial tcp 10.96.111.114:443: connect: connection refused

What did you expect to happen?

Investigate root cause and fix

How can we reproduce it (as minimally and precisely as possible)?

Push PR

Anything else we need to know?

No response

Kepler image tag

Deployment

  • [ ] Model server
  • [ ] Estimator
  • [ ] Online trainer
  • [ ] Offline trainer
  • [ ] Profiler

Kepler model server image tag if deployed

Kepler estimator image tag if deployed

Kepler online trainer image tag if deployed

Kepler offline trainer image tag if deployed

Kepler profiler image tag if deployed

Kubernetes version

$ kubectl version
# paste output here

Install tools

Kepler deployment config

For on kubernetes:

$ KEPLER_NAMESPACE=kepler

# provide kepler configmap
$ kubectl get configmap kepler-cfm -n ${KEPLER_NAMESPACE} 
# paste output here

# provide kepler model server configmap if Kepler Model Server is deployed 
$ kubectl get configmap kepler-model-server-cfm -n ${KEPLER_NAMESPACE} 
# paste output here

# provide kepler deployment description
$ kubectl describe deployment kepler-exporter -n ${KEPLER_NAMESPACE} 

For standalone:

put your Kepler command argument here

sunya-ch avatar Apr 05 '24 05:04 sunya-ch