mattermost-plugin-playbooks
mattermost-plugin-playbooks copied to clipboard
Improved logging
Summary
This PR was triggered by some recent investigation into community error logs:
I realized a few things, especially after digging into the code:
- We log too many things as
errors
that should instead be warnings. - Unlike the main server, we didn't have any logs for when an incoming http event occurred.
- It was hard to tie together logs from the same request, much less consistently identify the user in question.
- We logged the "server caller" but not the "plugin caller"
- We had 4 different logging patterns!!
This PR improves logging by:
- Eliminate
mlog
,bot.Logger
andpluginapi.Log
in favour of justlogrus
. (See comments below.) - Consistently leverage
logrus.WithError
to pull out errors into a dedicatederror
property in the JSON log output. - Only log >= 500 API errors as errors, otherwise just a warning
- Log a
Received HTTP request
before and aHandled HTTP request
after each handler invocation. - Generate a
request_id
property on a logger for use to tie together logs from the request handler, e.g. errors with the original request that triggered it. With the history, we can now find when the request started, what user triggered it, and the HTTP method and URL. (A future PR might pass that logger into the service subsystem so that deeper logs there are grouped accordingly.) - Log the "plugin_caller" not just the "server caller" (see (see https://github.com/mattermost/mattermost-plugin-api/pull/118 for the implementation).
- Documented philosophy around when to log an error vs. warning, etc.
- Lots of tidying up of logging to pull fields out discretely.
Why logrus over mlog
? I think structured logs are easier to do Logrus, plus you can scope and reuse such loggers:
logger := logrus.WithFields(logrus.Fields{
"method": r.Method,
"url": r.URL.String(),
"user_id": r.Header.Get("Mattermost-User-Id"),
"request_id": requestID,
})
r = r.WithContext(context.WithValue(r.Context(), requestIDContextKey, requestID))
// Includes fields above
logger.Debug("Received HTTP request")
next.ServeHTTP(&recorder, r)
// Includes fields above
logger.WithField("status", recorder.statusCode).Debug("Handled HTTP request")
Note that this PR depends on https://github.com/mattermost/mattermost-plugin-api/pull/118. (I'll update the commit properly once that PR is merged.)
Example
Here's a batch of logs:
{"timestamp":"2022-06-27 13:28:33.290 -03:00","level":"debug","msg":"Received HTTP request","caller":"app/plugin_api.go:934","plugin_id":"playbooks","url":"/api/v0/bot/connect","user_id":"5aedxdss3brg3p6bekcbydjxiy","request_id":"mdpwesbhhibhd8ds79ibh5owoe","method":"GET","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:67"}
{"timestamp":"2022-06-27 13:28:33.290 -03:00","level":"debug","msg":"Received HTTP request","caller":"app/plugin_api.go:934","plugin_id":"playbooks","user_id":"5aedxdss3brg3p6bekcbydjxiy","request_id":"7ymj5g3ppirepjcyu6fcf8hf6r","method":"GET","url":"/api/v0/settings","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:67"}
{"timestamp":"2022-06-27 13:28:33.291 -03:00","level":"debug","msg":"Handled HTTP request","caller":"app/plugin_api.go:934","plugin_id":"playbooks","request_id":"7ymj5g3ppirepjcyu6fcf8hf6r","status":"200","method":"GET","url":"/api/v0/settings","user_id":"5aedxdss3brg3p6bekcbydjxiy","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:71"}
{"timestamp":"2022-06-27 13:28:33.354 -03:00","level":"debug","msg":"Received HTTP request","caller":"app/plugin_api.go:934","plugin_id":"playbooks","method":"POST","url":"/api/v0/query","user_id":"5aedxdss3brg3p6bekcbydjxiy","request_id":"dt9hjizn1idjbkaszu1k58n6be","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:67"}
{"timestamp":"2022-06-27 13:28:33.355 -03:00","level":"debug","msg":"Received HTTP request","caller":"app/plugin_api.go:934","plugin_id":"playbooks","user_id":"5aedxdss3brg3p6bekcbydjxiy","request_id":"jb5hwraufpdwufuu3smhr1zsyh","method":"GET","url":"/api/v0/stats/playbook?playbook_id=i9s3k1tarpywbef3qbsm8gibir","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:67"}
{"timestamp":"2022-06-27 13:28:33.367 -03:00","level":"debug","msg":"Handled HTTP request","caller":"app/plugin_api.go:934","plugin_id":"playbooks","url":"/api/v0/bot/connect","user_id":"5aedxdss3brg3p6bekcbydjxiy","request_id":"mdpwesbhhibhd8ds79ibh5owoe","status":"200","method":"GET","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:71"}
{"timestamp":"2022-06-27 13:28:33.369 -03:00","level":"warn","msg":"Not authorized","caller":"app/plugin_api.go:943","plugin_id":"playbooks","request_id":"dt9hjizn1idjbkaszu1k58n6be","error":"does not have permissions\ngithub.com/mattermost/mattermost-plugin-playbooks/server/app.init\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/app/permissions_service.go:15\nruntime.doInit\n\truntime/proc.go:6222\nruntime.doInit\n\truntime/proc.go:6199\nruntime.doInit\n\truntime/proc.go:6199\nruntime.main\n\truntime/proc.go:233\nruntime.goexit\n\truntime/asm_arm64.s:1263\nuserID 5aedxdss3brg3p6bekcbydjxiy to access playbook\ngithub.com/mattermost/mattermost-plugin-playbooks/server/app.(*PermissionsService).PlaybookViewWithPlaybook\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/app/permissions_service.go:356\ngithub.com/mattermost/mattermost-plugin-playbooks/server/app.(*PermissionsService).PlaybookView\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/app/permissions_service.go:343\ngithub.com/mattermost/mattermost-plugin-playbooks/server/api.(*RootResolver).Playbook\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/api/graphql_root.go:26\nreflect.Value.call\n\treflect/value.go:556\nreflect.Value.Call\n\treflect/value.go:339\ngithub.com/graph-gophers/graphql-go/internal/exec.execFieldSelection.func2\n\tgithub.com/graph-gophers/[email protected]/internal/exec/exec.go:211\ngithub.com/graph-gophers/graphql-go/internal/exec.execFieldSelection\n\tgithub.com/graph-gophers/[email protected]/internal/exec/exec.go:231\ngithub.com/graph-gophers/graphql-go/internal/exec.(*Request).execSelections.func1\n\tgithub.com/graph-gophers/[email protected]/internal/exec/exec.go:81\nruntime.goexit\n\truntime/asm_arm64.s:1263","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/graphql_root.go:27"}
{"timestamp":"2022-06-27 13:28:33.369 -03:00","level":"error","msg":"Error executing request","caller":"app/plugin_api.go:940","plugin_id":"playbooks","request_id":"dt9hjizn1idjbkaszu1k58n6be","error":"graphql: Not authorized","operation":"Playbook","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/graphql.go:135"}
{"timestamp":"2022-06-27 13:28:33.369 -03:00","level":"warn","msg":"Not authorized","caller":"app/plugin_api.go:943","plugin_id":"playbooks","request_id":"jb5hwraufpdwufuu3smhr1zsyh","error":"does not have permissions\ngithub.com/mattermost/mattermost-plugin-playbooks/server/app.init\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/app/permissions_service.go:15\nruntime.doInit\n\truntime/proc.go:6222\nruntime.doInit\n\truntime/proc.go:6199\nruntime.doInit\n\truntime/proc.go:6199\nruntime.main\n\truntime/proc.go:233\nruntime.goexit\n\truntime/asm_arm64.s:1263\nuserID 5aedxdss3brg3p6bekcbydjxiy to access playbook\ngithub.com/mattermost/mattermost-plugin-playbooks/server/app.(*PermissionsService).PlaybookViewWithPlaybook\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/app/permissions_service.go:356\ngithub.com/mattermost/mattermost-plugin-playbooks/server/app.(*PermissionsService).PlaybookView\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/app/permissions_service.go:343\ngithub.com/mattermost/mattermost-plugin-playbooks/server/api.(*StatsHandler).playbookStats\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/api/stats.go:96\ngithub.com/mattermost/mattermost-plugin-playbooks/server/api.withLogger.func1\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:48\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\nmain.(*Plugin).getErrorCounterHandler.func1.1\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/plugin.go:386\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\ngithub.com/mattermost/mattermost-plugin-playbooks/server/api.MattermostAuthorizationRequired.func1\n\tgithub.com/m
attermost/mattermost-plugin-playbooks/server/api/api.go:103\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\ngithub.com/mattermost/mattermost-plugin-playbooks/server/api.LogRequest.func1\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:69\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\tgithub.com/gorilla/[email protected]/mux.go:210\ngithub.com/mattermost/mattermost-plugin-playbooks/server/api.(*Handler).ServeHTTP\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/api/api.go:48\nmain.(*Plugin).ServeHTTP\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/plugin.go:84\ngithub.com/mattermost/mattermost-server/v6/plugin.(*hooksRPCServer).ServeHTTP\n\tgithub.com/mattermost/mattermost-server/[email protected]/plugin/client_rpc.go:456\nreflect.Value.call\n\treflect/value.go:556\nreflect.Value.Call\n\treflect/value.go:339\nnet/rpc.(*service).call\n\tnet/rpc/server.go:381\nruntime.goexit\n\truntime/asm_arm64.s:1263","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/api.go:75"}
{"timestamp":"2022-06-27 13:28:33.369 -03:00","level":"debug","msg":"Handled HTTP request","caller":"app/plugin_api.go:934","plugin_id":"playbooks","method":"POST","url":"/api/v0/query","user_id":"5aedxdss3brg3p6bekcbydjxiy","request_id":"dt9hjizn1idjbkaszu1k58n6be","status":"200","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:71"}
{"timestamp":"2022-06-27 13:28:33.370 -03:00","level":"debug","msg":"Handled HTTP request","caller":"app/plugin_api.go:934","plugin_id":"playbooks","status":"403","user_id":"5aedxdss3brg3p6bekcbydjxiy","request_id":"jb5hwraufpdwufuu3smhr1zsyh","method":"GET","url":"/api/v0/stats/playbook?playbook_id=i9s3k1tarpywbef3qbsm8gibir","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:71"}
and now I can filter down to just one request_id
to trace the messages in question:
cat logs/mattermost.log | jq -c 'select(.plugin_id=="playbooks") | select(.request_id=="dt9hjizn1idjbkaszu1k58n6be") | .msg'
"Received HTTP request"
"Not authorized"
"Error executing request"
"Handled HTTP request"
Looks like using logrus
also allows us to detect unsanitized user input making it's way into logs -- previously masked, perhaps? Or did just touching this code trigger the detection? Either way, I'll take a look at these :)
data:image/s3,"s3://crabby-images/fac94/fac943a54e162e74e5c5924961caa634e7723010" alt="image"
One thing to note: right now, plugin_caller
is accurate, but because of our handler semantics, often just points to a helper method, e.g. "plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:67"
. I'd love to either find a way to notify the logger that this is a helper a such, or move these logs back closer to where they are actually invoked. (This isn't an issue where we don't abstract away logging.)
Looking at my own example, we'll need to do something better for GraphQL in this case:
{"timestamp":"2022-06-27 13:28:33.369 -03:00","level":"error","msg":"Error executing request","caller":"app/plugin_api.go:940","plugin_id":"playbooks","request_id":"dt9hjizn1idjbkaszu1k58n6be","error":"graphql: Not authorized","operation":"Playbook","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/graphql.go:135"}
Added myself as a reviewer to this PR, since it contains multiple CodeQL alerts.
Following up on the CodeQL issues at https://community.mattermost.com/core/pl/7zsxojoqpigif8odyuw5rdw41y.
@srkgupta, any idea what's going on with the code scanner above?
@lieut-data do you think we could separate in a fresh pr the fix of the log levels before we change the approach for logging?
Thanks, @trilopin! I considered this, but thought it might be easier just to switch to the context model. For now, the underlying logger is still encoded within the http.Request
context, but it's now accessed as per a *Context
parameter passed to all APIs.
Hopefully this unblocks us :)
(Note that I haven't switched us to mlog
until that package can be used seamlessly across the RPC boundary like we can do for logrus.)
@crspeller / @trilopin, I've fixed the build issues -- my upgrade of the plugin-api brought in the "new" EnsureBot
and triggered https://community.mattermost.com/core/pl/9c8txjwkytdczc6kw8ngggyour. Fixed by making the OwnerId
explicit for now.
@srkgupta, are the the security messages here just false positives again? What we can do to fix this from happening? I'm worried about all the future logging messages...
@srkgupta, are the the security messages here just false positives again? What we can do to fix this from happening? I'm worried about all the future logging messages...
Hi @lieut-data
My apologies. Somehow this missed my attention before. I have taken a look at the new alerts and have dismissed them as Won't Fix (Not Affected). Unfortunately we need to manually review these alerts and ensure that we aren't accidentally leaking sensitive credentials in the logs (or) directly logging user controlled inputs without proper sanitisation.
I have added a follow-up task for this and will pick this up and check how we can reduce such false positives.
Thanks
Thanks, @srkgupta! What's weird is that we've been logging all along without these warnings showing up in code reviews, and now that we're using logrus, CodeQL's automated checks seem to "discover" these logs. That makes me think all our existing logging in other applications is probably just getting ignored. Would we consider disabling these checks for now until they can catch all types of logs consistently? Otherwise, CodeQL may become very noisy in all subsequent PRs for Playbooks after this PR is merged.
@lieut-data these alerts are triggered because of the new related changes done in the PR and the alerts are only triggered once. Once these alerts are analyzed and dismissed, they will not be re-triggered again in the subsequent PRs unless there are additional similar code changes. This is applicable for other repos too. We had already analysed all the previous alerts and have taken appropriate actions.
Once these alerts are analyzed and dismissed, they will not be re-triggered again in the subsequent PRs unless there are additional similar code changes.
Thanks, @srkgupta! So, just to confirm, will new log messages trigger alerts in new PRs?
will new log messages trigger alerts in new PRs?
As long as they are some how directly logging a user controlled input as is, yes they will trigger new alert.
Thanks, @srkgupta! To explore this further, I created https://github.com/mattermost/mattermost-plugin-playbooks/pull/1465 using the existing logging infrastructure and referencing a user-controlled input that triggered a warning in the same way as this PR: https://github.com/mattermost/mattermost-plugin-playbooks/pull/1271/files#diff-6e3ba3ce59f140effc0a4e689feb56b2d7eb2f14529117cc7e20474962095028R296, but CodeQL did not generate a warning there.
@gbochora, I'd love to get this merged after we cut v7.3 final. Would you be able to take a look at the approach here? It's already been partially reviewed, but we just want that extra confidence to avoid churn later on :)
Thanks, @srkgupta! To explore this further, I created #1465 using the existing logging infrastructure and referencing a user-controlled input that triggered a warning in the same way as this PR: https://github.com/mattermost/mattermost-plugin-playbooks/pull/1271/files#diff-6e3ba3ce59f140effc0a4e689feb56b2d7eb2f14529117cc7e20474962095028R296, but CodeQL did not generate a warning there.
Got it. Thanks for the explanation @lieut-data. I might have to dig deep dive into the CodeQL tool to understand when & how such alerts are triggered. I will take this up along with the task and investigate this in detail. Meanwhile this PR can be merged as there are no open CodeQL alerts. Thanks.
@gbochora, I'd love to get this merged after we cut v7.3 final. Would you be able to take a look at the approach here? It's already been partially reviewed, but we just want that extra confidence to avoid churn later on :)
Sure, I'm in the process of review and will finish today.