guac
guac copied to clipboard
[bug] osv-certifier process should exit in case of error
Describe the bug osv-certifier process should exit in case of error
The osv-certifier process appears to stay running after an (unrecoverable?) error occurred.
e.g. when the graphql server not available as the certifier starts, the certifier throws an error and its process stays running
{"level":"info","ts":1690375037.1687462,"caller":"cli/init.go:53","msg":"Using config file: /guac/guac.yaml"} {"level":"error","ts":1690375037.1761382,"caller":"cmd/osv.go:129","msg":"certifier ended with error: failed sources query: Post "http://graphql-server.guac-qgs3jx30vi.svc.cluster.local:8080/query": dial tcp 172.20.127.164:8080: connect: connection refused","stacktrace":"github.com/guacsec/guac/cmd/guacone/cmd.glob..func9.2\n\t/home/runner/work/guac/guac/cmd/guacone/cmd/osv.go:129\ngithub.com/guacsec/guac/pkg/certifier/certify.Certify.func1\n\t/home/runner/work/guac/guac/pkg/certifier/certify/certify.go:78\ngithub.com/guacsec/guac/pkg/certifier/certify.Certify\n\t/home/runner/work/guac/guac/pkg/certifier/certify/certify.go:96\ngithub.com/guacsec/guac/cmd/guacone/cmd.glob..func9.3\n\t/home/runner/work/guac/guac/cmd/guacone/cmd/osv.go:141"}
The depsdev-collector, ingestor, and oci-collector process does exit under the same situation.
To Reproduce Steps to reproduce the behavior:
Run the osv-certifier without the graphql server running, observe the process state and logs
Expected behavior The osv-certifier process should exit or retry for a recoverable error. When running in a container, the exit event causes the container to be restarted by the orchestrator hence recovering from the failure.
The depsdev-collector, ingestor, and oci-collector process does exit under the same situation.
GUAC version v0.1.1
@sunnyyip Can I take this issue?
Sure assigning it to you!
Upon reproducing the issue the osv-certifier process does exit if the graphql server is not running. Is there a particular condition where it does not exit?
hmm @sunnyyip might be able to provide further insight
issue name should be: hotel california osv-certifier :)
We were running into the same issue.
Could I work on this?
For the fix, I was planning to do an exponential backoff during initialization before the process terminates.
I'm seeing this with the cd certifier today in addition to osv certifier. i.e. when the graphql-server wasn't available when the certifier starts, the certifier emits an error and did nothing. The container continues to run and report healthy status.
This is my pod spec:
spec:
containers:
image: ghcr.io/guacsec/guac:v0.8.0
- command:
- sh
- -c
- /opt/guac/guaccollect cd
osv-certifier log
{"level":"info","ts":1722453079.3718085,"caller":"logging/logger.go:78","msg":"Logging at info level","guac-version":"v0.8.0"}
{"level":"info","ts":1722453079.3719168,"caller":"cli/init.go:65","msg":"Using config file: /guac/guac.yaml","guac-version":"v0.8.0"}
{"level":"error","ts":1722453082.8948643,"caller":"cmd/osv.go:228","msg":"certifier ended with error: failed neighbors query: returned error 502 Bad Gateway: <html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n","guac-version":"v0.8.0","stacktrace":"github.com/guacsec/guac/cmd/guaccollect/cmd.initializeNATsandCertifier.func2\n\t/home/runner/work/guac/guac/cmd/guaccollect/cmd/osv.go:228\ngithub.com/guacsec/guac/pkg/certifier/certify.Certify.func1\n\t/home/runner/work/guac/guac/pkg/certifier/certify/certify.go:77\ngithub.com/guacsec/guac/pkg/certifier/certify.Certify\n\t/home/runner/work/guac/guac/pkg/certifier/certify/certify.go:95\ngithub.com/guacsec/guac/cmd/guaccollect/cmd.initializeNATsandCertifier.func3\n\t/home/runner/work/guac/guac/cmd/guaccollect/cmd/osv.go:239"}
cd-certifier log
{"level":"info","ts":1722453079.8838563,"caller":"logging/logger.go:78","msg":"Logging at info level","guac-version":"v0.8.0"}
{"level":"info","ts":1722453079.883925,"caller":"cli/init.go:65","msg":"Using config file: /guac/guac.yaml","guac-version":"v0.8.0"}
{"level":"error","ts":1722453083.1917777,"caller":"cmd/osv.go:228","msg":"certifier ended with error: failed neighbors query: returned error 502 Bad Gateway: <html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n","guac-version":"v0.8.0","stacktrace":"github.com/guacsec/guac/cmd/guaccollect/cmd.initializeNATsandCertifier.func2\n\t/home/runner/work/guac/guac/cmd/guaccollect/cmd/osv.go:228\ngithub.com/guacsec/guac/pkg/certifier/certify.Certify.func1\n\t/home/runner/work/guac/guac/pkg/certifier/certify/certify.go:77\ngithub.com/guacsec/guac/pkg/certifier/certify.Certify\n\t/home/runner/work/guac/guac/pkg/certifier/certify/certify.go:95\ngithub.com/guacsec/guac/cmd/guaccollect/cmd.initializeNATsandCertifier.func3\n\t/home/runner/work/guac/guac/cmd/guaccollect/cmd/osv.go:239"}