ocis icon indicating copy to clipboard operation
ocis copied to clipboard

[QA] [Flaky] Fails to reach service debug address

Open ishabaral opened this issue 9 months ago • 5 comments

Build: https://drone.owncloud.com/owncloud/ocis/44210/17/5 https://drone.owncloud.com/owncloud/ocis/44197/21/5 https://drone.owncloud.com/owncloud/ocis/44195/21/5

    When a user requests these URLs with "GET" and no authentication                         # AuthContext::aUserRequestsTheseUrlsWithAndNoAuthentication()
      | endpoint                                | service |
      | http://%base_url_hostname%:9229/healthz | audit   |
      | http://%base_url_hostname%:9229/readyz  | audit   |
      cURL error 7: Failed to connect to ocis-server port 9229: Connection refused (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for http://ocis-server:9229/healthz (GuzzleHttp\Exception\ConnectException)
==> REQUEST
	GET /healthz
	X-Request-ID: apiServiceAvailability/serviceAvailabilityCheck.feature:142-150

Scenarios:

  /drone/src/tests/acceptance/features/apiServiceAvailability/serviceAvailabilityCheck.feature:111
  /drone/src/tests/acceptance/features/apiServiceAvailability/serviceAvailabilityCheck.feature:120
  /drone/src/tests/acceptance/features/apiServiceAvailability/serviceAvailabilityCheck.feature:131
  /drone/src/tests/acceptance/features/apiServiceAvailability/serviceAvailabilityCheck.feature:142

ociswrapper log

2025/03/14 01:05:31 [ociswrapper] ocis service port 9250 is no longer reachable
2025/03/14 01:05:31 [ociswrapper] Restarting oCIS server...
2025/03/14 01:05:31 [ociswrapper] Starting oCIS service...
2025/03/14 01:05:34 [ociswrapper] oCIS server is ready to accept requests
2025/03/14 01:05:34 Starting audit service...
{"level":"info","service":"audit","service":"audit","endpoint":"/healthz","time":"2025-03-14T01:05:34Z","line":"/drone/src/ocis-pkg/service/debug/service.go:27","message":"no probe provided, reverting to default (OK)"}
2025/03/14 01:05:36 [ociswrapper] audit service is ready to listen on port 9229
2025/03/14 01:05:38 [ociswrapper] audit service is ready to listen on port 9229
{"level":"info","service":"audit","transport":"stream","server":"audit","time":"2025-03-14T01:05:38Z","line":"/drone/src/services/audit/pkg/command/server.go:56","message":"Shutting down server"}
2025/03/14 01:05:40 [ociswrapper] audit service port 9229 is no longer reachable
2025/03/14 01:05:40 audit service stopped successfully
2025/03/14 01:05:40 [ociswrapper] Stopping oCIS server...
2025/03/14 01:05:42 [ociswrapper] ocis service port 9250 is no longer reachable
2025/03/14 01:05:42 [ociswrapper] Restarting oCIS server...
2025/03/14 01:05:42 [ociswrapper] Starting oCIS service...
2025/03/14 01:05:44 [ociswrapper] oCIS server is ready to accept requests

ishabaral avatar Mar 14 '25 04:03 ishabaral

even though audit service port starts listening, it seems http://ocis-server:9229/healthz endpoint is not available .

Log

2025/03/18 05:47:33 Starting audit service...
{"level":"info","service":"audit","service":"audit","endpoint":"/healthz","time":"2025-03-18T05:47:33Z","line":"/drone/src/ocis-pkg/service/debug/service.go:27","message":"no probe provided, reverting to default (OK)"}
2025/03/18 05:47:35 audit service is ready to listen on port 9229
{"level":"info","service":"audit","transport":"stream","server":"audit","time":"2025-03-18T05:47:37Z","line":"/drone/src/services/audit/pkg/command/server.go:56","message":"Shutting down server"}
2025/03/18 05:47:39 audit service stopped successfully
2025/03/18 05:47:39 [ociswrapper] Stopping oCIS server...

Everytime test failed we get this log

{"level":"info","service":"audit","service":"audit","endpoint":"/healthz","time":"2025-03-18T05:47:33Z","line":"/drone/src/ocis-pkg/service/debug/service.go:27","message":"no probe provided, reverting to default (OK)"}

amrita-shrestha avatar Mar 18 '25 05:03 amrita-shrestha

even though audit service port starts listening, it seems http://ocis-server:9229/healthz endpoint is not available .

Log

2025/03/18 05:47:33 Starting audit service...
{"level":"info","service":"audit","service":"audit","endpoint":"/healthz","time":"2025-03-18T05:47:33Z","line":"/drone/src/ocis-pkg/service/debug/service.go:27","message":"no probe provided, reverting to default (OK)"}
2025/03/18 05:47:35 audit service is ready to listen on port 9229
{"level":"info","service":"audit","transport":"stream","server":"audit","time":"2025-03-18T05:47:37Z","line":"/drone/src/services/audit/pkg/command/server.go:56","message":"Shutting down server"}
2025/03/18 05:47:39 audit service stopped successfully
2025/03/18 05:47:39 [ociswrapper] Stopping oCIS server...

Everytime test failed we get this log

{"level":"info","service":"audit","service":"audit","endpoint":"/healthz","time":"2025-03-18T05:47:33Z","line":"/drone/src/ocis-pkg/service/debug/service.go:27","message":"no probe provided, reverting to default (OK)"}

cc @2403905

amrita-shrestha avatar Mar 20 '25 04:03 amrita-shrestha

~The audit service doesn't have the healthz check the readyz only.~ ~The auth-bearer doesn't have any probs.~

~There is a server probes table~ ~ https://github.com/owncloud/ocis/issues/10281~

2403905 avatar Mar 20 '25 13:03 2403905

Each service has the default /healthz and /readyz. If the service doesn't have any additional checks it fails to default and we can see "no probe provided, reverting to default (OK)" in the logs and HTTP response 200. https://github.com/owncloud/ocis/blob/cdb179f656235a0f99f67f9b41c5104440354fed/ocis-pkg/service/debug/service.go#L27 https://github.com/owncloud/ocis/blob/cdb179f656235a0f99f67f9b41c5104440354fed/ocis-pkg/service/debug/service.go#L48

In a failed test curl returns error 7 Failed to connect() to host or proxy. https://drone.owncloud.com/owncloud/ocis/44197/21/5 It looks like the audit service is down at this moment. Can the ociswrapper stop a service before the tests are done?

2403905 avatar Mar 20 '25 16:03 2403905

Recent failures:

  • https://drone.owncloud.com/owncloud/ocis/44542/17/5
  • https://drone.owncloud.com/owncloud/ocis/44558/17/5
  • https://drone.owncloud.com/owncloud/ocis/44664/17/5
  • https://drone.owncloud.com/owncloud/ocis/44674/17/5
  • https://drone.owncloud.com/owncloud/ocis/44695/17/5

saw-jan avatar Mar 27 '25 05:03 saw-jan

Possible fix: #11198 Let's see the builds

saw-jan avatar Apr 03 '25 05:04 saw-jan

unassigning myself

amrita-shrestha avatar Apr 03 '25 12:04 amrita-shrestha

@nirajacharya2 @saw-jan @anon-pradip

CI => https://drone.owncloud.com/owncloud/ocis/45074/21/5

      cURL error 7: Failed to connect to ocis-server port 9134: Connection refused (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for http://ocis-server:9134/healthz (GuzzleHttp\Exception\ConnectException)

amrita-shrestha avatar Apr 15 '25 05:04 amrita-shrestha

This main issue mentioned in this ticket has been fixed by https://github.com/owncloud/ocis/pull/11198 And we no longer see the failures mentioned in the ticket description. So closing here

saw-jan avatar Apr 23 '25 04:04 saw-jan

@nirajacharya2 @saw-jan @anon-pradip

CI => https://drone.owncloud.com/owncloud/ocis/45074/21/5

      cURL error 7: Failed to connect to ocis-server port 9134: Connection refused (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for http://ocis-server:9134/healthz (GuzzleHttp\Exception\ConnectException)

This is a similar but different failure. Please open a separate ticket for this

saw-jan avatar Apr 23 '25 04:04 saw-jan