guac icon indicating copy to clipboard operation
guac copied to clipboard

[bug] osv-certifier process should exit in case of error

Open sunnyyip opened this issue 2 years ago • 8 comments

Describe the bug osv-certifier process should exit in case of error

The osv-certifier process appears to stay running after an (unrecoverable?) error occurred.

e.g. when the graphql server not available as the certifier starts, the certifier throws an error and its process stays running

{"level":"info","ts":1690375037.1687462,"caller":"cli/init.go:53","msg":"Using config file: /guac/guac.yaml"} {"level":"error","ts":1690375037.1761382,"caller":"cmd/osv.go:129","msg":"certifier ended with error: failed sources query: Post "http://graphql-server.guac-qgs3jx30vi.svc.cluster.local:8080/query": dial tcp 172.20.127.164:8080: connect: connection refused","stacktrace":"github.com/guacsec/guac/cmd/guacone/cmd.glob..func9.2\n\t/home/runner/work/guac/guac/cmd/guacone/cmd/osv.go:129\ngithub.com/guacsec/guac/pkg/certifier/certify.Certify.func1\n\t/home/runner/work/guac/guac/pkg/certifier/certify/certify.go:78\ngithub.com/guacsec/guac/pkg/certifier/certify.Certify\n\t/home/runner/work/guac/guac/pkg/certifier/certify/certify.go:96\ngithub.com/guacsec/guac/cmd/guacone/cmd.glob..func9.3\n\t/home/runner/work/guac/guac/cmd/guacone/cmd/osv.go:141"}

The depsdev-collector, ingestor, and oci-collector process does exit under the same situation.

To Reproduce Steps to reproduce the behavior:

Run the osv-certifier without the graphql server running, observe the process state and logs

Expected behavior The osv-certifier process should exit or retry for a recoverable error. When running in a container, the exit event causes the container to be restarted by the orchestrator hence recovering from the failure.

The depsdev-collector, ingestor, and oci-collector process does exit under the same situation.

GUAC version v0.1.1

sunnyyip avatar Jul 26 '23 13:07 sunnyyip

@sunnyyip Can I take this issue?

arorasoham9 avatar Jul 26 '23 18:07 arorasoham9

Sure assigning it to you!

pxp928 avatar Jul 26 '23 18:07 pxp928

Upon reproducing the issue the osv-certifier process does exit if the graphql server is not running. Is there a particular condition where it does not exit?

arorasoham9 avatar Aug 03 '23 18:08 arorasoham9

hmm @sunnyyip might be able to provide further insight

pxp928 avatar Aug 03 '23 18:08 pxp928

issue name should be: hotel california osv-certifier :)

lumjjb avatar Aug 18 '23 14:08 lumjjb

We were running into the same issue.

Could I work on this?

naveensrinivasan avatar Sep 20 '23 22:09 naveensrinivasan

For the fix, I was planning to do an exponential backoff during initialization before the process terminates.

naveensrinivasan avatar Sep 20 '23 22:09 naveensrinivasan

I'm seeing this with the cd certifier today in addition to osv certifier. i.e. when the graphql-server wasn't available when the certifier starts, the certifier emits an error and did nothing. The container continues to run and report healthy status.

This is my pod spec:

spec:
  containers:
    image: ghcr.io/guacsec/guac:v0.8.0
  - command:
    - sh
    - -c
    - /opt/guac/guaccollect cd

osv-certifier log

{"level":"info","ts":1722453079.3718085,"caller":"logging/logger.go:78","msg":"Logging at info level","guac-version":"v0.8.0"}
{"level":"info","ts":1722453079.3719168,"caller":"cli/init.go:65","msg":"Using config file: /guac/guac.yaml","guac-version":"v0.8.0"}
{"level":"error","ts":1722453082.8948643,"caller":"cmd/osv.go:228","msg":"certifier ended with error: failed neighbors query: returned error 502 Bad Gateway: <html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n","guac-version":"v0.8.0","stacktrace":"github.com/guacsec/guac/cmd/guaccollect/cmd.initializeNATsandCertifier.func2\n\t/home/runner/work/guac/guac/cmd/guaccollect/cmd/osv.go:228\ngithub.com/guacsec/guac/pkg/certifier/certify.Certify.func1\n\t/home/runner/work/guac/guac/pkg/certifier/certify/certify.go:77\ngithub.com/guacsec/guac/pkg/certifier/certify.Certify\n\t/home/runner/work/guac/guac/pkg/certifier/certify/certify.go:95\ngithub.com/guacsec/guac/cmd/guaccollect/cmd.initializeNATsandCertifier.func3\n\t/home/runner/work/guac/guac/cmd/guaccollect/cmd/osv.go:239"}

cd-certifier log

{"level":"info","ts":1722453079.8838563,"caller":"logging/logger.go:78","msg":"Logging at info level","guac-version":"v0.8.0"}
{"level":"info","ts":1722453079.883925,"caller":"cli/init.go:65","msg":"Using config file: /guac/guac.yaml","guac-version":"v0.8.0"}
{"level":"error","ts":1722453083.1917777,"caller":"cmd/osv.go:228","msg":"certifier ended with error: failed neighbors query: returned error 502 Bad Gateway: <html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n","guac-version":"v0.8.0","stacktrace":"github.com/guacsec/guac/cmd/guaccollect/cmd.initializeNATsandCertifier.func2\n\t/home/runner/work/guac/guac/cmd/guaccollect/cmd/osv.go:228\ngithub.com/guacsec/guac/pkg/certifier/certify.Certify.func1\n\t/home/runner/work/guac/guac/pkg/certifier/certify/certify.go:77\ngithub.com/guacsec/guac/pkg/certifier/certify.Certify\n\t/home/runner/work/guac/guac/pkg/certifier/certify/certify.go:95\ngithub.com/guacsec/guac/cmd/guaccollect/cmd.initializeNATsandCertifier.func3\n\t/home/runner/work/guac/guac/cmd/guaccollect/cmd/osv.go:239"}

sunnyyip avatar Jul 31 '24 19:07 sunnyyip