testkube icon indicating copy to clipboard operation
testkube copied to clipboard

After Isito injection testkube connection error happens

Open Zogoo opened this issue 2 years ago • 13 comments

Describe the bug A K8s environment with Istio TLS connection between pods. Without Istio injection "testkube" cannot connect to the actual test endpoint because TLS connection Connection reset by peer error happens.

To Reproduce Steps to reproduce the behavior:

  1. Run 'kubectl testkube run test'
  2. Inject "testkube" name space with Istio kubectl label namespace testkube istio-injection=disabled --overwrite
  3. See error error: error trying to reach service: read tcp 172.17.0.1:59970->172.17.0.38:8088: read: connection reset by peer ⨯ getting test suites executions list (error: api/GET-testkube.TestSuiteExecutionsResult returned error: api server response: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"error trying to reach service: read tcp 172.17.0.1:59972-\u003e172.17.0.38:8088: read: connection reset by peer","reason":"ServiceUnavailable","code":503}

Expected behavior A clear and concise description of what you expected to happen.

Version / Cluster

  • Which testkube version? : 1.2.48
  • What Kubernetes cluster? (e.g. GKE, EKS, Openshift etc, local KinD, local Minikube) Minikube, Istio mTLS between pods.
  • What Kubernetes version? v1.21.11

Screenshots

$kube get pods -n testkube
NAME                                                    READY   STATUS    RESTARTS   AGE
testkube-api-server-f86c985b8-297fs                     2/2     Running   1          20h
testkube-dashboard-6f5f84f8d8-5b9t7                     2/2     Running   0          20h
testkube-minio-testkube-64cd475b94-fc5hb                2/2     Running   0          20h
testkube-mongodb-6c9c5db4d5-wq9xh                       2/2     Running   0          20h
testkube-operator-controller-manager-66ff4cdfd4-tblg2   3/3     Running   1          20h

Additional context 172.17.0.37:8088 - API server internal IP address 172.17.0.1:59970 - Not sure which pod's IP address it is.

API server POD log

Available migrations for v1.3.0
No migrations available for v1.3.0
{"level":"warn","ts":1657104355.4716427,"caller":"api-server/main.go:105","msg":"Getting uniqe clusterId","error":null}
{"level":"info","ts":1657104355.5050254,"caller":"v1/server.go:279","msg":"Testkube API configured","namespace":"testkube","clusterId":"clusterbb669eef1b556e914a11107ae51ccfa9","telemetry":false}
segment 2022/07/06 10:45:55 ERROR: sending request - Post "https://api.segment.io/v1/batch": read tcp 172.17.0.37:40666->35.155.223.175:443: read: connection reset by peer
segment 2022/07/06 10:45:55 ERROR: 1 messages dropped because they failed to be sent and the client was closed
{"level":"info","ts":1657104355.5379503,"caller":"api-server/main.go:130","msg":"starting Testkube API server","telemetryEnabled":true,"clusterId":"clusterbb669eef1b556e914a11107ae51ccfa9","namespace":"testkube"}

 ┌───────────────────────────────────────────────────┐
 │                   Fiber v2.31.0                   │
 │               http://127.0.0.1:8088               │
 │       (bound on host 0.0.0.0 and port 8088)       │
 │                                                   │
 │ Handlers ........... 166  Processes ........... 1 │
 │ Prefork ....... Disabled  PID ................. 1 │
 └───────────────────────────────────────────────────┘

segment 2022/07/06 10:45:55 ERROR: sending request - Post "https://api.segment.io/v1/batch": read tcp 172.17.0.37:32808->52.34.77.50:443: read: connection reset by peer
segment 2022/07/06 10:45:55 ERROR: 1 messages dropped because they failed to be sent and the client was closed
segment 2022/07/06 11:45:55 ERROR: sending request - Post "https://api.segment.io/v1/batch": EOF
segment 2022/07/06 11:45:55 ERROR: 1 messages dropped because they failed to be sent and the client was closed
segment 2022/07/06 12:45:55 ERROR: sending request - Post "https://api.segment.io/v1/batch": EOF
segment 2022/07/06 12:45:55 ERROR: 1 messages dropped because they failed to be sent and the client was closed
segment 2022/07/06 13:45:55 ERROR: sending request - Post "https://api.segment.io/v1/batch": EOF
segment 2022/07/06 13:45:55 ERROR: 1 messages dropped because they failed to be sent and the client was closed
segment 2022/07/06 14:45:55 ERROR: sending request - Post "https://api.segment.io/v1/batch": EOF
segment 2022/07/06 14:45:55 ERROR: 1 messages dropped because they failed to be sent and the client was closed
segment 2022/07/06 15:45:55 ERROR: sending request - Post "https://api.segment.io/v1/batch": EOF
segment 2022/07/06 15:45:55 ERROR: 1 messages dropped because they failed to be sent and the client was closed
segment 2022/07/06 16:45:55 ERROR: sending request - Post "https://api.segment.io/v1/batch": EOF
segment 2022/07/06 16:45:55 ERROR: 1 messages dropped because they failed to be sent and the client was closed
segment 2022/07/06 17:45:55 ERROR: sending request - Post "https://api.segment.io/v1/batch": EOF
segment 2022/07/06 17:45:55 ERROR: 1 messages dropped because they failed to be sent and the client was closed
segment 2022/07/06 18:45:55 ERROR: sending request - Post "https://api.segment.io/v1/batch": EOF
segment 2022/07/06 18:45:55 ERROR: 1 messages dropped because they failed to be sent and the client was closed
segment 2022/07/06 19:45:55 ERROR: sending request - Post "https://api.segment.io/v1/batch": read tcp 172.17.0.37:44758->44.241.139.196:443: read: connection reset by peer
segment 2022/07/06 19:45:55 ERROR: 1 messages dropped because they failed to be sent and the client was closed
< Ommiting same error logs>

Zogoo avatar Jul 07 '22 06:07 Zogoo

Thank you, @Zogoo Really glad that users started using Testkube in complex K8s envs, like Istio or Openshift. We didn't trt it in such a configuration. Need to check it out

vsukhin avatar Jul 07 '22 07:07 vsukhin

Hi @Zogoo thanks for getting Istio issue - we'll try to reproduce it on our clusters, as it's first time when we facing Istio based one.

Can you provide to us your config around network policies? Is your service allowed to communicate with Testkube?

exu avatar Jul 07 '22 07:07 exu

@vsukhin , @exu I'm also very glad for the fast response and seeing the excitement for the issue. I'm sorry for the lack of information. I have just updated steps. @exu we don't have any specific blocking rules between service and pods. If you give me what configs that you need for investigation I will be happy to help with that.

Zogoo avatar Jul 07 '22 08:07 Zogoo

Hello @Zogoo,

We have updated our Helm charts with a policy that allows to run testkube in Istio setup with mTLS enabled. Please upgrade the charts in your k8s cluster to a latest version - 1.3.20 and let us know about the results.

ypoplavs avatar Jul 14 '22 13:07 ypoplavs

@ypoplavs, hey, thanks for the rapid update. I was expecting a new tag in Github when you mentioned "1.3.20", Or am I missing another way to check it out? Could you please provide some direction on how to get the 1.3.20 version? Again thanks for your guys' lightening fast correspondence.

I'm not familiar with helm, but which helm should be updated is it "testkube" helm version? Does it look correct?

zogoo% helm search repo kubeshop
NAME                            	CHART VERSION	APP VERSION	DESCRIPTION
kubeshop/api-server             	0.11.16      	0.11.16    	A Helm chart for Api-server
kubeshop/kusk-gateway           	0.0.27       	v1.1.1     	A Helm chart for Kusk Gateway
kubeshop/kusk-gateway-api       	0.1.5        	v1.1.0     	A Helm chart for the Kusk Gateway API
kubeshop/kusk-gateway-dashboard 	0.1.4        	v1.1.0     	A Helm chart for Kusk Gateway API's dashboard
kubeshop/kusk-gateway-envoyfleet	0.0.3        	v0.0.0     	A Helm chart for Kusk Gateway EnvoyFleet
kubeshop/testkube               	1.3.20       	           	A Helm chart for testkube.
kubeshop/testkube-api           	1.3.10       	1.3.10     	A Helm chart for Testkube api
kubeshop/testkube-dashboard     	1.3.2        	1.3.2      	A Helm chart for Kubernetes
kubeshop/testkube-operator      	1.3.0        	           	A Helm chart for the testkube-operator (install...
kubeshop/tracetest              	0.1.34       	v0.5.6     	A Helm chart for Trace Test

Zogoo avatar Jul 14 '22 15:07 Zogoo

Hey, Zogoo. The simplest way is to run our cli command like kubectl testkube upgrade. It will install latest helm chart for you. Or manually helm upgrade --install testkube kubeshop/testkube

vsukhin avatar Jul 14 '22 16:07 vsukhin

@vsukhin cc: @ypoplavs @exu I still could not todo test it. Because I have got other error in my env. https://github.com/kubeshop/testkube/issues/1816. I will let you guys know once I could run the test in my Istio env.

Zogoo avatar Jul 15 '22 06:07 Zogoo

@vsukhin cc: @ypoplavs , @exu

After I uninstall and re-install "testkube", the situation gets changed from #1816. I hope the following information will help you guys' work.

Version:

$kubectl testkube version

Client Version 1.3.4
Server Version v1.3.14
Commit
Built by Homebrew
Build date

$helm search repo kubeshop
NAME                            	CHART VERSION	APP VERSION	DESCRIPTION
kubeshop/api-server             	0.11.16      	0.11.16    	A Helm chart for Api-server
kubeshop/kusk-gateway           	0.0.28       	v1.1.2     	A Helm chart for Kusk Gateway
kubeshop/kusk-gateway-api       	0.1.5        	v1.1.0     	A Helm chart for the Kusk Gateway API
kubeshop/kusk-gateway-dashboard 	0.1.4        	v1.1.0     	A Helm chart for Kusk Gateway API's dashboard
kubeshop/kusk-gateway-envoyfleet	0.0.3        	v0.0.0     	A Helm chart for Kusk Gateway EnvoyFleet
kubeshop/testkube               	1.3.46       	           	A Helm chart for testkube.
kubeshop/testkube-api           	1.3.14       	1.3.14     	A Helm chart for Testkube api
kubeshop/testkube-dashboard     	1.3.3        	1.3.3      	A Helm chart for Kubernetes
kubeshop/testkube-operator      	1.3.14       	           	A Helm chart for the testkube-operator (install...
kubeshop/tracetest              	0.1.34       	v0.5.6     	A Helm chart for Trace Test

Issue: No execution log

$kubectl testkube watch execution 62d6622aefb5328d0a1c5921
Getting pod logs
Execution completed
⨯ unexpected end of JSON input
Use following command to get test execution details:
$ kubectl testkube get execution 62d6622aefb5328d0a1c5921

# ----- No output message just ended up with "started"
$kube testkube get execution 62d6622aefb5328d0a1c5921
ID:        62d6622aefb5328d0a1c5921
Name:      pds-test-1
Type:      postman/collection
Duration:

Test execution started

# ------ I can see that test is executed and get results
 $kubectl logs 62d6622aefb5328d0a1c5921-8pdwf -n testkube
{"type":"line","content":"  GET http://test-namespace.test-service:80/531bd72c-ad13-46fd-81c4-b61ca5bc51ee/vehicle/attr/type "}
{"type":"line","content":"[401 Unauthorized, 222B, 8ms]\n"}
{"type":"line","content":"\n↳ Update resource\n"}
{"type":"line","content":"  PATCH http://test-namespace.test-service:80/531bd72c-ad13-46fd-81c4-b61ca5bc51ee/address "}
{"type":"line","content":"[401 Unauthorized, 222B, 6ms]\n"}
{"type":"line","content":"  2. Status code is 201\n"}

# --- API service logs
kube logs testkube-api-server-57f7fc998d-thttx -n testkube
I0719 07:48:10.602426       1 request.go:601] Waited for 1.032892643s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/telemetry.istio.io/v1alpha1?timeout=32s
Available migrations for v1.3.14
No migrations available for v1.3.14
{"level":"warn","ts":1658216894.2731085,"caller":"api-server/main.go:105","msg":"Getting uniqe clusterId","error":null}
{"level":"info","ts":1658216894.3101046,"caller":"v1/server.go:287","msg":"Testkube API configured","namespace":"testkube","clusterId":"cluster802ac368e062c2fceb2b9cb7e8503c54","telemetry":false}
segment 2022/07/19 07:48:14 ERROR: sending request - Post "https://api.segment.io/v1/batch": read tcp 172.17.0.41:36076->35.163.112.23:443: read: connection reset by peer
segment 2022/07/19 07:48:14 ERROR: 1 messages dropped because they failed to be sent and the client was closed
{"level":"info","ts":1658216894.3746228,"caller":"api-server/main.go:130","msg":"starting Testkube API server","telemetryEnabled":true,"clusterId":"cluster802ac368e062c2fceb2b9cb7e8503c54","namespace":"testkube"}

 ┌───────────────────────────────────────────────────┐
 │                   Fiber v2.31.0                   │
 │               http://127.0.0.1:8088               │
 │       (bound on host 0.0.0.0 and port 8088)       │
 │                                                   │
 │ Handlers ........... 166  Processes ........... 1 │
 │ Prefork ....... Disabled  PID ................. 1 │
 └───────────────────────────────────────────────────┘

segment 2022/07/19 07:48:14 ERROR: sending request - Post "https://api.segment.io/v1/batch": EOF
segment 2022/07/19 07:48:14 ERROR: 1 messages dropped because they failed to be sent and the client was closed
{"level":"error","ts":1658216987.0427742,"caller":"server/httpserver.go:68","msg":"tests.tests.testkube.io \"pds-test\" not found","status":404,"stacktrace":"github.com/kubeshop/testkube/pkg/server.(*HTTPServer).Error\n\t/build/pkg/server/httpserver.go:68\ngithub.com/kubeshop/testkube/internal/app/api/v1.TestkubeAPI.GetTestHandler.func1\n\t/build/internal/app/api/v1/tests.go:30\ngithub.com/gofiber/fiber/v2.(*App).next\n\t/go/pkg/mod/github.com/gofiber/fiber/[email protected]/router.go:132\ngithub.com/gofiber/fiber/v2.(*Ctx).Next\n\t/go/pkg/mod/github.com/gofiber/fiber/[email protected]/ctx.go:792\ngithub.com/kubeshop/testkube/internal/app/api/v1.TestkubeAPI.AuthHandler.func1\n\t/build/internal/app/api/v1/handlers.go:57\ngithub.com/gofiber/fiber/v2.(*Ctx).Next\n\t/go/pkg/mod/github.com/gofiber/fiber/[email protected]/ctx.go:789\ngithub.com/gofiber/fiber/v2/middleware/cors.New.func1\n\t/go/pkg/mod/github.com/gofiber/fiber/[email protected]/middleware/cors/cors.go:141\ngithub.com/gofiber/fiber/v2.(*App).next\n\t/go/pkg/mod/github.com/gofiber/fiber/[email protected]/router.go:132\ngithub.com/gofiber/fiber/v2.(*Ctx).Next\n\t/go/pkg/mod/github.com/gofiber/fiber/[email protected]/ctx.go:792\ngithub.com/kubeshop/testkube/pkg/server.(*HTTPServer).Init.func1\n\t/build/pkg/server/httpserver.go:40\ngithub.com/gofiber/fiber/v2.(*App).next\n\t/go/pkg/mod/github.com/gofiber/fiber/[email protected]/router.go:132\ngithub.com/gofiber/fiber/v2.(*App).handler\n\t/go/pkg/mod/github.com/gofiber/fiber/[email protected]/router.go:160\ngithub.com/valyala/fasthttp.(*Server).serveConn\n\t/go/pkg/mod/github.com/valyala/[email protected]/server.go:2341\ngithub.com/valyala/fasthttp.(*workerPool).workerFunc\n\t/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:224\ngithub.com/valyala/fasthttp.(*workerPool).getCh.func1\n\t/go/pkg/mod/github.com/valyala/[email protected]/workerpool.go:196"}
{"level":"info","ts":1658216987.0699482,"caller":"v1/tests.go:314","msg":"creating test","request":{"name":"test-collection","namespace":"testkube","type":"postman/collection","content":{"type":"string","data":"{\n\t\"info\": {\n\t\t\"_postman_id\": \"eddc6b0d-e79d-41ac-8210-08bb2c32c7f7\"
<.....OMITTING postman collection details...>
{"level":"info","ts":1658217002.6485372,"caller":"v1/executions.go:191","msg":"calling executor with options","options":{"name":"pds-test-1","number":1,"ExecutionLabels":null}}
{"level":"info","ts":1658217004.2015455,"caller":"v1/executions.go:251","msg":"test executed","executionId":"62d6622aefb5328d0a1c5921","status":"running"}

Zogoo avatar Jul 19 '22 08:07 Zogoo

Hey, @Zogoo What do see when the test was executed, if you run kubectl testkube get pods -n testkube. Are executor pods in completed state? If not can you check their logs and describe pods comands?

vsukhin avatar Jul 19 '22 08:07 vsukhin

@vsukhin nope they are in the same situation that I mentioned in other issue #1816 1816

zogoo% kube get pods -n testkube
NAME                                                    READY   STATUS     RESTARTS   AGE
62d6622aefb5328d0a1c5921-8pdwf                          1/2     NotReady   0          18h
testkube-api-server-57f7fc998d-thttx                    2/2     Running    1          19h
testkube-dashboard-6bd9dd679f-vpm2r                     2/2     Running    0          19h
testkube-minio-testkube-64cd475b94-wsqbf                2/2     Running    0          19h
testkube-mongodb-6c9c5db4d5-bphrx                       2/2     Running    0          19h
testkube-operator-controller-manager-85b979bbf6-gllwj   3/3     Running    1          19h

Description of pod returns a massive amount of information, if you tell me which part is useful for your investigation let me share that part here.

Zogoo avatar Jul 20 '22 02:07 Zogoo

Thank you @Zogoo I guess, it's important to understand, why they are not ready. Should give us some ideas what is wrong. You can copy last 15 lines of describe command. If you want, we can have a short Google Meet session at this time tomorrow or when you have time to check them together

vsukhin avatar Jul 20 '22 05:07 vsukhin

Hey @Zogoo here is my Calendly link: https://calendly.com/yulia-poplavska/30min Please choose suitable time for you.

ypoplavs avatar Jul 20 '22 05:07 ypoplavs

@ypoplavs I booked it on the 9th of August. Looking for seeing you there.

Zogoo avatar Jul 21 '22 05:07 Zogoo

It's necessary to disable Istio injection in testkube namespace and re-install the tool:

  • remove Testkube from your Istio cluster: helm uninstall testkube -n testkube
  • run kubectl label namespace testkube istio-injection=disabled --overwrite
  • install Testkube helm install testkube testkube/testkube

It's important to check that Istio is not in testkube namespace anymore since it won't prevent jobs from correct execution.

ypoplavs avatar Oct 04 '22 14:10 ypoplavs