Add options to log to stackdriver from non appengine/k8s bots
On some of our android fuzzing hosts, we don't have the fluentd server that takes care of propagating the python logs to stackdriver. I suggest this change that adds the option to log there directly. It was a bit of a mess to add the additional fields given that we are stuck with a relatively old version of the google cloud logging api. I also am in a situation were we have three GCP projects involved (one for the service account, one for the queues and one for the logging). Also I added a bunch of variables to make it customizable (To chose which to activate in the console/file/fluentd/gcp logs). Alternatively, we might be better of using an input dictconfig or fileconfig (https://docs.python.org/3/library/logging.config.html).
/gcbrun
@oliverchang I just realized that you had #3422. I am not sure how relevant this is once you land it, but meanwhile, we have limited logs on our chrome android hosts. Given the specificities of those hosts (logs logged in a different project than the service account and the cloud_project_id), we would still need some of the configurability we have here.
@oliverchang gentle ping on this one :)
Hmm, I opened https://github.com/google/clusterfuzz/pull/3422 for exactly this :) But we are blocked on a Python upgrade for that.
I couldn't get structured logging to work with the older client libraries. Does your PR make that work?
Not really, I worked around this issue, by adding labels, see example below. That's ugly TBH, but as mentioned we're missing out on python logs on several of our bots (the ones hosted by android's fuzzing infra) I also kept the existing (fluentd) behavior, as I wasn't sure we wanted to remove that or not. Long term (i.e. when python is updated) I believe we should have only the content of your PR.
log:
{ "insertId": "5tvjopf2hctof", "jsonPayload": { "python_logger": "android_heartbeat", "message": "Android device 2C011FDH3007VB state: device" }, "resource": { "type": "global", "labels": { "project_id": "google.com:clusterfuzz" } }, "timestamp": "2024-01-15T13:57:36.444799Z", "severity": "INFO", "labels": { "compute.googleapis.com/resource_name": "pmeuleman1.roam.corp.google.com", "fuzz_target": "null", "location": "{\"path\": \"/mnt/scratch0/cf/clusterfuzz/src/python/bot/startup/android_heartbeat.py\", \"line\": 53, \"method\": \"main\"}", "task_payload": "null", "bot_name": "android-haiku-chrome-test", "worker_bot_name": "null", "extra": "{}" }, "logName": "projects/google.com:clusterfuzz/logs/python", "receiveTimestamp": "2024-01-15T13:57:36.465772799Z" }
LGTM, but are you able to check this doesn't break existing fluentd logging?
You can test this by following https://github.com/google/clusterfuzz/blob/master/local/README.md#running-a-bot-locally to run a Docker container locally that closely replicates a GCE bot.
I tested locally but at that time I did not know how to start a local fluentd, so for that part I was validating with a netcat that the results were similar to what is sent in our production clients.
I managed to have it running now that I understood how it was installed, and got the following logs (I used both LOG_TO_GCP=TRUE and LOG_TO_FLUENTD=TRUE, hence the dupes): https://pantheon.corp.google.com/logs/query;query=SEARCH%2528%222C011FDH3007VB%22%2529%0A-protoPayload.methodName%3D%22google.logging.v2.LoggingServiceV2.ReadLogEntriesLegacy%22%0A-protoPayload.methodName%3D%22google.logging.v2.LoggingServiceV2.AggregateLogs%22%0Aseverity%3E%3DINFO;storageScope=storage,projects%2Fgoogle.com:clusterfuzz%2Flocations%2Fglobal%2Fbuckets%2F_Required%2Fviews%2F_AllLogs,projects%2Fgoogle.com:clusterfuzz%2Flocations%2Fglobal%2Fbuckets%2F_Default%2Fviews%2F_AllLogs,projects%2Fgoogle.com:clusterfuzz%2Flocations%2Fglobal%2Fbuckets%2F_Default%2Fviews%2F_Default,projects%2Fcluster-fuzz%2Flocations%2Fglobal%2Fbuckets%2F_Default%2Fviews%2F_AllLogs,projects%2Fcluster-fuzz%2Flocations%2Fglobal%2Fbuckets%2F_Default%2Fviews%2F_Default,projects%2Fcluster-fuzz%2Flocations%2Fglobal%2Fbuckets%2F_Required%2Fviews%2F_AllLogs;summaryFields=:false:32:beginning;cursorTimestamp=2024-02-19T16:31:38.310903463Z;startTime=2024-02-19T16:30:00.000Z;endTime=2024-02-19T16:32:00.000Z?project=google.com:clusterfuzz&e=-13802955&mods=logs_tg_prod&pli=1
/gcbrun