woodpecker
woodpecker copied to clipboard
k8s agent fails to tail step logs
Component
agent
Describe the bug
For some of our repos, trivy scan step, triggers the agent to log this error. I noticed this not just with trivy but also with megalinter.
{ "level": "error", "error": "rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8", "time": "2023-08-20T07:18:46Z", "message": "grpc error: log(): code: Internal: rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8" }
Trivy command executed as part of the step.
trivy fs \ --exit-code 1 \ --skip-dirs vendor,node_modules \ --report summary \ --severity CRITICAL \ --scanners vuln \ --ignore-unfixed \ --quiet \ .
System Info
{"source":"https://github.com/woodpecker-ci/woodpecker","version":"next-0c282e86e8"}
Additional context
No response
Validations
- [X] Read the Contributing Guidelines.
- [X] Read the docs.
- [X] Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
- [X] Checked that the bug isn't fixed in the
next
version already [https://woodpecker-ci.org/faq#which-version-of-woodpecker-should-i-use] - [x] Check that this is a concrete bug. For Q&A join our Discord Chat Server or the Matrix room.
Is this still an issue?
yes.... have the same problem (woodpecker 2.3.0).... making woodpecker-agent spin indefinitely using a lot of cpu... have to kill the agent frequently to make it work again.... it's annoying....
Can you get debug logs?
I have the same issue (2.3.0 with k8s agent). This example workflow triggers this issue:
steps:
import-gpg:
image: debian:stable-slim
commands:
- apt-get -qq update
- apt-get -qq install --yes --no-install-recommends ca-certificates wget gpg gpg-agent dirmngr
- wget --quiet https://downloads.apache.org/maven/KEYS
- gpg --import KEYS
While importing the KEYS woodpecker agent logs:
{"level":"error","error":"rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8","time":"2024-02-21T10:02:11Z","message":"grpc error: log(): code: Internal"}
I wrote that above already: please provide debug logs if possible. They contain extra fields like the file and line where this is logged. Otherwise it's hard to help here (at least for me, maybe the people knowing more about k8s can tell you the issue).
With WOODPECKER_DEBUG_PRETTY=true and WOODPECKER_LOG_LEVEL=trace i got:
1:07PM TRC src/pipeline/rpc/log_entry.go:82 > grpc write line: gpg: key 4F9BDB78DB596386: public key "Vincent Siveton <[email protected]>" imported
gpg: key B95BBD3FA43C4492: 22 signatures not checked due to missing keys
gpg: key B95BBD3FA43C4492: public key "Carlos Sanchez <[email protected]>" imported
gpg: key E181870DC625BAFB: public key "Maria Odea Ching <[email protected]>" imported
gpg: key D2A765937BA507E8: public key "Mauro Talevi <[email protected]>" imported
gpg: key C92C5FEC70161C62: public key "Herve Boutemy <[email protected]>" imported
gpg: key 873A8E86B4372146: public key "Olivier Lamy <[email protected]>" imported
gpg: key 8F0FABD30F353251: public key "Dan Fabulich <[email protected]>" imported
gpg: key 0B7B8BED064C851C: public key "Rapha�l Pi�roni <[email protected]>" imported
gpg: key C2D879C8B0874707: public key "Milos Kleint <[email protected]>" imported
gpg: key 8FE7CB51365A46EF: public key "Mark Hobson <[email protected]>" imported
gpg: key A3F9CCC081C4177D: public key "Oleg Gusakov <[email protected]>" imported
gpg: key 30E6F80434A72A7F: public key "John Dennis Casey <[email protected]>" imported
gpg: key E8579C183571506F: public key "Rapha�l Pi�roni <[email protected]>" imported
gpg: key A8AD2A7E9A25CE21: public key "Barrie Treloar <[email protected]>" imported
gpg: key 3B58205B9D7013A9: public key "Dan T. Tran <[email protected]>" imported
step-uuid=01HQ5WD8SBBCTMPE81QTQJS272
1:07PM ERR src/agent/rpc/client_grpc.go:335 > grpc error: log(): code: Internal error="rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8"
This was a problem for me as well so I've attempted a fix here https://github.com/woodpecker-ci/woodpecker/pull/3601
https://github.com/woodpecker-ci/woodpecker/pull/3601#issuecomment-2056975576
I haven't noticed any missing logs on our builds. However I think the true test would be to run a build that outputs non UTF-8 characters and see how Woodpecker behaves. Given the issue, I'd expect Woodpecker to not print them
Indeed, WP doesn't print them.
I've reproduced this with pipeline:
skip_clone: true
steps:
logs:
image: alpine
commands:
- apk add openssl
- openssl rand 1000
- echo Done
WP output:
+ apk add openssl
fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/community/x86_64/APKINDEX.tar.gz
(1/3) Upgrading libcrypto3 (3.1.4-r5 -> 3.1.4-r6)
(2/3) Upgrading libssl3 (3.1.4-r5 -> 3.1.4-r6)
(3/3) Installing openssl (3.1.4-r6)
Executing busybox-1.36.1-r15.trigger
OK: 8 MiB in 16 packages
+ openssl rand 1000
wp-01hwa0ya0qqn1exr68j5ftj5tq.log
Pod was in succeeded
state before pipeline cancellation.
But I didn't notice high CPU usage (at least after I cancelled pipeline):
> kubectl top pods -n test-scm-cicd
NAME CPU(cores) MEMORY(bytes)
gitea-0 3m 125Mi
woodpecker-agent-d9b864c76-lqdgj 1m 10Mi
woodpecker-server-75c9794774-nglv7 3m 29Mi
Tested on Woodpecker 2.3.0
in Kubernetes.
woodpecker-server.log
woodpecker-agent.log
As you can see, Agent logs error even after pipeline was cancelled.
Now my pipeline is in status killed
, but the step stays in the running
.
I had the feeling we use base64 encoding for the log data, but I can't find it in the agent code anymore 🤔. Shouldn't base64-encoding the log data solve this issue?
I have a feeling that the encoding happens on the server at some point.
I have a feeling that the encoding happens on the server at some point.
Same for me, but I have no idea where we do this 🙈 Searched for encoding/base64
, but can't find where we do this for the log entries at all. Will try to find it using the debugger when I've some time. Any hints are more than welcome 😉
we use base64 encoding for the log data
Nope:
https://github.com/woodpecker-ci/woodpecker/blob/d0057736f164a8a12e671ff3c39ac2c33b80a7f6/server/model/log.go#L33
postgres=# \c woodpecker_test
woodpecker_test=# \d log_entries
Table "public.log_entries"
Column | Type | Collation | Nullable | Default
---------+---------+-----------+----------+-----------------------------------------
id | bigint | | not null | nextval('log_entries_id_seq'::regclass)
step_id | bigint | | |
time | bigint | | |
line | integer | | |
data | bytea | | |
created | bigint | | |
type | integer | | |
Indexes:
"log_entries_pkey" PRIMARY KEY, btree (id)
"IDX_log_entries_step_id" btree (step_id)
woodpecker_test=# select data from log_entries where id=5;
data
------------------------------------
\x2b20676974206c66732066657463680a
(1 row)
woodpecker_test=# select convert_from(data, 'UTF-8') from log_entries where id=5;
convert_from
-----------------
+ git lfs fetch+
(1 row)
Shouldn't base64-encoding the log data solve this issue?
Yes, I think.
Thanks for the help.
Just had the feeling as we are getting the logs base64 encoded from the api 🤔
https://github.com/woodpecker-ci/woodpecker/blob/d0057736f164a8a12e671ff3c39ac2c33b80a7f6/web/src/components/repo/pipeline/PipelineLog.vue#L298
https://github.com/woodpecker-ci/woodpecker/blob/d0057736f164a8a12e671ff3c39ac2c33b80a7f6/web/src/components/repo/pipeline/PipelineLog.vue#L303
Anyways I guess the solution should be quite simple by base64 encoding in the agent code directly
base64 encoding in the agent code
Should it be decoded after receiving / before storing in DB?
Edit 1:
If we do not want to store Base64 in DB, then I'm for solution mentioned by @fernandrone and @eliasscosta:
edit the .proto and change the string fields to bytes
While doing some testing in #2072 I always got the grpc error: log(): code: Internal: rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8
error when I was sending extremely long log lines.
While doing some testing I always got the error when I was sending extremely long log lines
Seems, I cannot reproduce this.
But I've found other issues(?). Pipeline with 13MB "one"-line log:
skip_clone: true
steps:
logs:
image: alpine
commands:
- apk add openssl
- openssl rand 10000000 | base64 -w 0
- echo Done
Run number 182: I run pipeline. Do not remember status. Server pod restarted with panic. After that I see full log (414 lines). woodpecker-server.log woodpecker-agent.log
Run number 184:
I run pipeline. Step stayed in not started
state, therefore no logs. When I saw pipeline is finished in K3s logs, I refreshed web page: success, full log (415 lines).
Run number 185:
I run pipeline. Step showed up in running
state. When I saw in OpenLens, that some pod (Server or Agent) flashed orange, but there was no pod restarts. In UI there were 274 lines of truncated log. After refreshing the page - the same.
Run number 187:
Pipeline started with step in status running
. Then it moved to success
. In UI there were 290 lines of truncated logs. After refreshing the page - 289 lines.
woodpecker-server.log
woodpecker-agent.log
Run number 188:
Right after 187. I run pipeline, then went to pipelines list page. Server pod restarted, Reason: Error - exit code: 2. Agent did not, but log is truncated (maybe some go selfhealing). After Server restarted, I see pipeline and step in success
state. There is full log with 415 lines.
woodpecker-server.log
woodpecker-agent.log
Woodpecker 2.4.1, K3s 1.29.4.