launcher icon indicating copy to clipboard operation
launcher copied to clipboard

error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8

Open cyberonstian opened this issue 5 years ago • 20 comments

We are in the process of deploying Kolide to our customers and we get this error on multiple of the windows clients. When the error is logged launcher stops sending data to the server.

We have tried updating osqueryd (3.3.2) and launcher (9.0.2) to latest versions but the error still persists.

If we delete the osquery.db files and restart launcher it works for a while and then the error reappears and the log traffic stops.

We have done some investigation and nailed it down to this query: SELECT * FROM programs

This works perfectly in Kolide fleet dashboard, but when running it scheduled launcher fails to send the logs due to the error message. We have seen the error message on multiple computers but are still investigation what queries triggers those endpoints.

To nail it down a bit further we found the software in the list that had the characters that triggered the error: SELECT * FROM programs WHERE name = "HxD Hex Editor version 1.7.7.0"

What version of fleet are you using (fleet version --full)?

fleet - version 2.0.2 branch: master revision: 8ca0358bf28173685815b79d8683a4239d629a14 build date: 2019-01-18T00:39:40Z build user: zwass go version: go1.11.3¨

What operating system are you using?

Windows 10 Pro 1893

What did you do?

  1. Install Kolide Fleet agent on the client
  2. Start in debug mode
  3. Wait a couple of minutes, we see that the server is recieving data from the client
  4. The error appears in the logs and no data is recieved on the server.

What did you expect to see?

Data sent from launcher.exe to kolide fleet

What did you see instead?

The two first log entries are the error message, the last one is the software that we think is causing them.

{"caller":"publish_logs.go:157","err":"rpc error: code = Internal desc = grpc: error while marshaling: proto: field "kolide.agent.LogCollection.Log.Data" contains invalid UTF-8","errcode":"","logType":"string","log_count":3,"message":"","method":"PublishLogs","reauth":false,"severity":"info","took":"0s","ts":"2019-03-12T12:34:15.8004054Z","uuid":"bde64b33-697d-4782-b8a2-866e0a44e71a"}

{"caller":"extension.go:494","err":"sending string logs: writing logs: transport error sending logs: rpc error: code = Internal desc = grpc: error while marshaling: proto: field "kolide.agent.LogCollection.Log.Data" contains invalid UTF-8","severity":"info","ts":"2019-03-12T12:34:15.8014026Z"}

{"caller":"publish_results.go:168","err":null,"errcode":"","message":"","method":"PublishResults","reauth":false,"results":"[{"query_name":"kolide_distributed_query_352","status":0,"rows":[{"identifying_number":"","install_date":"20180625","install_location":"C:\\Program Files (x86)\\HxD\\","install_source":"","language":"","name":"HxD Hex Editor version 1.7.7.0","publisher":"Ma�l H�rz","uninstall_string":"\"C:\\Program Files (x86)\\HxD\\unins000.exe\"","version":"1.7.7.0"}]}]","severity":"debug","took":"16.9527ms","ts":"2019-03-12T22:34:46.991793Z","uuid":"7729e05c-71a1-4f7b-bd37-8fb37e76e6db"}

cyberonstian avatar Mar 12 '19 22:03 cyberonstian

https://github.com/facebook/osquery/issues/5288

groob avatar Mar 12 '19 23:03 groob

Hello,

This bug still exists and block a normal usage. I'm on Kolide Fleet 2.5.0 on server side and Windows 10 1709 with Kolide Launcher 0.11.9 and osquery 4.2.0 on client side. This a test virtual machine, therefore there are few things installed on it.

I confirm that the bug happen when a query is done on table programs. Then, none query return result, Kolide Launcher can be restarted but is still on a bad state and the quick fix is to delete the local database of Kolide Launcher. But this will block again when the scheduled query is started.

I think the bug is on Kolide Launcher side because when I run osquery with exact same parameters as Kolide Launcher starts it, it works like a charm, osqueryd return results and keep continue to work.

Here is the command I run for osqueryd:

"C:\Program Files\Kolide\Launcher-launcher\bin\osqueryd.exe-updates\1583771796\osqueryd" --tls_hostname=xxx.yyy.zzz:8080 --enroll_secret_path="C:\Program Files\Kolide\Launcher-launcher\conf\secret" --enroll_tls_endpoint=/api/v1/osquery/enroll --config_plugin=tls --config_tls_endpoint=/api/v1/osquery/config --config_refresh=10 --host_identifier=uuid --pidfile="C:\Program Files\Kolide\Launcher-launcher\data\osquery.db\osquery.pid" --database_path="C:\Program Files\Kolide\Launcher-launcher\data\osquery.db\osquery.db" --extensions_socket="C:\Program Files\Kolide\Launcher-launcher\data\osquery.sock" --extensions_timeout=10 -logger_plugin=tls --distributed_plugin=tls --disable_distributed=false --distributed_interval=5 --pack_delimiter=: --host_identifier=uuid --force=true --disable_watchdog --utc --verbose --config_refresh=300 --config_accelerated_refresh=30 --extensions_autoload="C:\Program Files\Kolide\Launcher-launcher\data\osquery.autoload"

Can you fix this bug please? I can do more tests if required.

polak785 avatar Mar 20 '20 09:03 polak785

I think the encoding bug is in osquery, as discussed in https://github.com/osquery/osquery/issues/5288 but I agree that launcher shouldn't crash.

How are you issuing the that query? Is it a scheduled query, or being sent as a distributed query?

directionless avatar Mar 20 '20 11:03 directionless

This is a scheduled query with logging as snapshot.

Yes this is very blocking that Launcher crash because the only fix is to delete local database.

polak785 avatar Mar 20 '20 13:03 polak785

I have not yet spun up a test environment for this, but I don't think we see this on the SaaS side of things. I'm wondering if launcher is still up, but it's stuck in a loop trying to send the same log to the server. Do the launcher logs indicate?

directionless avatar Mar 20 '20 15:03 directionless

Yes the same log occur every minute:

{"caller":"publish_logs.go:157","err":null,"errcode":"","logType":"status","log_count":9,"message":"","method":"PublishLogs","reauth":false,"severity":"debug","took":"34.045ms","ts":"2020-03-19T09:45:29.3740801Z","uuid":"bf28a7df-0748-4076-8c59-09d300b3b8e0"}
{"caller":"publish_logs.go:157","err":"rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","errcode":"","logType":"string","log_count":4530,"message":"","method":"PublishLogs","reauth":false,"severity":"info","took":"16.3766ms","ts":"2020-03-19T09:45:29.4973711Z","uuid":"7196d32e-f1a0-4d82-8b6f-2be5306376bb"}
{"caller":"extension.go:494","err":"sending string logs: writing logs: transport error sending logs: rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","severity":"info","ts":"2020-03-19T09:45:29.5006111Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"30.0947ms","ts":"2020-03-19T09:45:37.0671645Z","uuid":"9c8f5303-2baa-4271-81d9-25321910f906"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"31.2682ms","ts":"2020-03-19T09:45:47.4014405Z","uuid":"d8ffa71b-6486-41a8-8f2c-d87f4013533a"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"29.0556ms","ts":"2020-03-19T09:45:56.6949539Z","uuid":"299d8b07-3360-47f1-bdd4-b938f0a006d6"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"30.7393ms","ts":"2020-03-19T09:46:05.0590382Z","uuid":"4ebf9656-9878-4d1f-96e1-285985d5bfba"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"34.1126ms","ts":"2020-03-19T09:46:12.8515392Z","uuid":"77db8c33-5e0c-4890-8d9f-d41895f6e5a1"}
{"caller":"query_target.go:71","msg":"server does not implement GetTargets","severity":"debug","ts":"2020-03-19T09:46:21.3208714Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"31.79ms","ts":"2020-03-19T09:46:21.4265874Z","uuid":"2fdb6afb-eb1b-499a-b586-f1e938e43d28"}
{"caller":"publish_logs.go:157","err":"rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","errcode":"","logType":"string","log_count":4530,"message":"","method":"PublishLogs","reauth":false,"severity":"info","took":"10.1363ms","ts":"2020-03-19T09:46:24.2113109Z","uuid":"40f6d97a-1b81-4074-8343-71a5623b46be"}
{"caller":"extension.go:494","err":"sending string logs: writing logs: transport error sending logs: rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","severity":"info","ts":"2020-03-19T09:46:24.2203495Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"29.8889ms","ts":"2020-03-19T09:46:30.8535053Z","uuid":"74066323-9566-4e30-9d56-be3c7d9fc525"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"32.7023ms","ts":"2020-03-19T09:46:41.2258338Z","uuid":"c90a6fbf-ea71-4993-aa9f-6a5f6b006803"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"30.8766ms","ts":"2020-03-19T09:46:52.6475113Z","uuid":"402dd069-77f5-4034-9a04-98aec1f56c45"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"30.3889ms","ts":"2020-03-19T09:47:05.2297429Z","uuid":"75efbae4-4699-483e-93a0-f335fee6cb41"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"29.0314ms","ts":"2020-03-19T09:47:18.5264584Z","uuid":"7e8c8d50-3f12-4023-985c-c830aa435a1b"}
{"caller":"query_target.go:71","msg":"server does not implement GetTargets","severity":"debug","ts":"2020-03-19T09:47:30.0738021Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"46.5929ms","ts":"2020-03-19T09:47:30.5033555Z","uuid":"9633aa80-b003-4651-9478-d5d6014ada96"}
{"caller":"publish_logs.go:157","err":"rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","errcode":"","logType":"string","log_count":4530,"message":"","method":"PublishLogs","reauth":false,"severity":"info","took":"8.4364ms","ts":"2020-03-19T09:47:33.4980049Z","uuid":"d505fbb3-7326-4cd5-8c3b-0e8105407cf5"}
{"caller":"extension.go:494","err":"sending string logs: writing logs: transport error sending logs: rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","severity":"info","ts":"2020-03-19T09:47:33.5020486Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"29.5181ms","ts":"2020-03-19T09:47:41.2621737Z","uuid":"febf6a67-e727-48bd-b5de-6abafe0b03e0"}
{"caller":"scheduler.cpp:96","component":"osquery","level":"stderr","msg":"I0319 10:47:49.286285  7356 scheduler.cpp:96] Executing scheduled query pack/Software/Installed software list - Windows: select name as software_name, version as software_version from programs;","severity":"debug","ts":"2020-03-19T09:47:49.2939863Z"}
{"caller":"process_ops.cpp:99","component":"osquery","level":"stderr","msg":"I0319 10:47:49.298260  7356 process_ops.cpp:99] No account name provided","severity":"info","ts":"2020-03-19T09:47:49.3016493Z"}
{"caller":"logged_in_users.cpp:129","component":"osquery","level":"stderr","msg":"I0319 10:47:49.298260  7356 logged_in_users.cpp:129] Error converting username to SID","severity":"info","ts":"2020-03-19T09:47:49.3025674Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"30.894ms","ts":"2020-03-19T09:47:50.9465303Z","uuid":"aaf33ff4-6b2d-42fc-bda8-449650f93766"}

polak785 avatar Mar 20 '20 16:03 polak785

I would say that indicates it's not a launcher bug. Launcher is sending a log, and fleet is rejecting it. Launcher is then resending it. The reason you need to drop the database, is because launcher will always try to resend stored logs.

I am not immediately sure whether I think launcher should drop the log when the server rejects it. It's the sort of thing that would cause logs to be dropped, and potentially break any kind of monitoring.

My gut sense is that fleet should accept these and either fix them, and record the data, or discard them. @zwass what do you think here?

directionless avatar Mar 20 '20 17:03 directionless

As reference, the original issue has been created on fleet : https://github.com/kolide/fleet/issues/2014

But as with osquery this works fine, I think a fix should be done on Kolide side to accept the log and not discard it.

polak785 avatar Mar 20 '20 19:03 polak785

This does not look like an error on the Fleet side. Notice the error message:

rpc error: code = Internal desc = grpc: error while marshaling: proto: field "kolide.agent.LogCollection.Log.Data" contains invalid UTF-8

There wouldn't be any marshalling on the Fleet side, only unmarshalling.

zwass avatar Mar 20 '20 19:03 zwass

rpc error: code = Internal desc = grpc: error while marshaling: proto: field "kolide.agent.LogCollection.Log.Data" contains invalid UTF-8

Hrm. I'll take a closer look. @zwass What do you think launcher should do? We talked about it briefly in https://github.com/kolide/launcher/pull/481

directionless avatar Mar 20 '20 22:03 directionless

Either escape it in some way, or drop the invalid characters I suppose. Probably best to do this only after receiving the error so we don't add an extra encoding/scanning step for every single log.

zwass avatar Mar 20 '20 23:03 zwass

I seem to be experience this error based on logs im seeing from the service on windows host, has a workaround or solution been determined yet, im running the most recent version of launcher (0.11.9) and fleet version 2.6.0

wkleinhenz avatar Apr 17 '20 18:04 wkleinhenz

For my concern, the bug has been fixed with latest osquery release (4.3.0).

The related PR is https://github.com/osquery/osquery/pull/6190

All encoding issues should be fixed when the PR https://github.com/osquery/osquery/pull/6338 will be released

polak785 avatar Apr 27 '20 11:04 polak785

Is there any progress on this? 4.5.1 and all my event logs across all my Windows servers are seeing thousands of:

caller=log.go:124 ts=2021-01-19T16:50:26.127444Z caller=level.go:63 level=info caller=extension.go:494 err="sending string logs: writing logs: transport error sending logs: rpc error: code = Internal desc = grpc: error while marshaling: proto: field "kolide.agent.LogCollection.Log.Data" contains invalid UTF-8"

caller=log.go:124 ts=2021-01-19T16:50:26.1264442Z caller=level.go:63 level=info caller=publish_logs.go:179 method=PublishLogs uuid=6e2ccedd-1f13-4ea9-9bcc-37d9611e3eff logType=string log_count=1103 message= errcode= reauth=false err="rpc error: code = Internal desc = grpc: error while marshaling: proto: field "kolide.agent.LogCollection.Log.Data" contains invalid UTF-8" took=14.0018ms

B3DTech avatar Jan 19 '21 16:01 B3DTech

@B3DTech FWIW we (github.com/fleetdm) are working on an osquery autoupdater that takes a different approach by letting osquery do the heavy lifting. This (I believe) will help resolve issues like this one with the grpc transport in Launcher.

zwass avatar Jan 20 '21 19:01 zwass

@B3DTech The underlying issue here, is that osquery sometimes produces data that is not utf-8 compatible. This is generally believed to be a bug. What do you think launcher should do when that happens?

directionless avatar Jan 20 '21 21:01 directionless

My problem is that it isn't "sometimes". It's my event logs on every machine are getting flooded with 2-4 of these message every minute. I'm not sure what should happen because I don't know why it's happening or what it's choking on. Osquery shouldn't be failing to produce UTF-8 messages.

And additionally, my 4.5.1 clients don't seem to be sending event logs, but my 4.2 clients are.

B3DTech avatar Jan 20 '21 22:01 B3DTech

Regardless of whether osquery "should" produce non-utf8, it does. So what do you think launcher should do? Attempt to repair the data? Drop it on the floor? Encode it?

directionless avatar Jan 20 '21 22:01 directionless

I've updated #481. It should now attempt to repair the data, then redact the non-utf8 characters. I can't easily test it though

directionless avatar Jan 21 '21 02:01 directionless

It was determined that there is mis-formatted osquery results in the osquery store from the previous version of osquery, and Launcher is still trying to send that. Removing the C:\Program Files\Kolide\Launcher-so-launcher\data\ directory and restarting the service fixed the issue - no more UTF8 events.

B3DTech avatar Jan 29 '21 20:01 B3DTech

I think this is as resolved as it's getting

directionless avatar Dec 20 '22 12:12 directionless