mattermost-plugin-playbooks icon indicating copy to clipboard operation
mattermost-plugin-playbooks copied to clipboard

Improved logging

Open lieut-data opened this issue 2 years ago • 10 comments

Summary

This PR was triggered by some recent investigation into community error logs: image

I realized a few things, especially after digging into the code:

  • We log too many things as errors that should instead be warnings.
  • Unlike the main server, we didn't have any logs for when an incoming http event occurred.
  • It was hard to tie together logs from the same request, much less consistently identify the user in question.
  • We logged the "server caller" but not the "plugin caller"
  • We had 4 different logging patterns!!

This PR improves logging by:

  • Eliminate mlog, bot.Logger and pluginapi.Log in favour of just logrus. (See comments below.)
  • Consistently leverage logrus.WithError to pull out errors into a dedicated error property in the JSON log output.
  • Only log >= 500 API errors as errors, otherwise just a warning
  • Log a Received HTTP request before and a Handled HTTP request after each handler invocation.
  • Generate a request_id property on a logger for use to tie together logs from the request handler, e.g. errors with the original request that triggered it. With the history, we can now find when the request started, what user triggered it, and the HTTP method and URL. (A future PR might pass that logger into the service subsystem so that deeper logs there are grouped accordingly.)
  • Log the "plugin_caller" not just the "server caller" (see (see https://github.com/mattermost/mattermost-plugin-api/pull/118 for the implementation).
  • Documented philosophy around when to log an error vs. warning, etc.
  • Lots of tidying up of logging to pull fields out discretely.

Why logrus over mlog? I think structured logs are easier to do Logrus, plus you can scope and reuse such loggers:

logger := logrus.WithFields(logrus.Fields{
	"method":     r.Method,
	"url":        r.URL.String(),
	"user_id":    r.Header.Get("Mattermost-User-Id"),
	"request_id": requestID,
})
r = r.WithContext(context.WithValue(r.Context(), requestIDContextKey, requestID))

// Includes fields above
logger.Debug("Received HTTP request")

next.ServeHTTP(&recorder, r)

// Includes fields above
logger.WithField("status", recorder.statusCode).Debug("Handled HTTP request")

Note that this PR depends on https://github.com/mattermost/mattermost-plugin-api/pull/118. (I'll update the commit properly once that PR is merged.)

Example

Here's a batch of logs:

{"timestamp":"2022-06-27 13:28:33.290 -03:00","level":"debug","msg":"Received HTTP request","caller":"app/plugin_api.go:934","plugin_id":"playbooks","url":"/api/v0/bot/connect","user_id":"5aedxdss3brg3p6bekcbydjxiy","request_id":"mdpwesbhhibhd8ds79ibh5owoe","method":"GET","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:67"}
{"timestamp":"2022-06-27 13:28:33.290 -03:00","level":"debug","msg":"Received HTTP request","caller":"app/plugin_api.go:934","plugin_id":"playbooks","user_id":"5aedxdss3brg3p6bekcbydjxiy","request_id":"7ymj5g3ppirepjcyu6fcf8hf6r","method":"GET","url":"/api/v0/settings","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:67"}
{"timestamp":"2022-06-27 13:28:33.291 -03:00","level":"debug","msg":"Handled HTTP request","caller":"app/plugin_api.go:934","plugin_id":"playbooks","request_id":"7ymj5g3ppirepjcyu6fcf8hf6r","status":"200","method":"GET","url":"/api/v0/settings","user_id":"5aedxdss3brg3p6bekcbydjxiy","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:71"}
{"timestamp":"2022-06-27 13:28:33.354 -03:00","level":"debug","msg":"Received HTTP request","caller":"app/plugin_api.go:934","plugin_id":"playbooks","method":"POST","url":"/api/v0/query","user_id":"5aedxdss3brg3p6bekcbydjxiy","request_id":"dt9hjizn1idjbkaszu1k58n6be","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:67"}
{"timestamp":"2022-06-27 13:28:33.355 -03:00","level":"debug","msg":"Received HTTP request","caller":"app/plugin_api.go:934","plugin_id":"playbooks","user_id":"5aedxdss3brg3p6bekcbydjxiy","request_id":"jb5hwraufpdwufuu3smhr1zsyh","method":"GET","url":"/api/v0/stats/playbook?playbook_id=i9s3k1tarpywbef3qbsm8gibir","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:67"}
{"timestamp":"2022-06-27 13:28:33.367 -03:00","level":"debug","msg":"Handled HTTP request","caller":"app/plugin_api.go:934","plugin_id":"playbooks","url":"/api/v0/bot/connect","user_id":"5aedxdss3brg3p6bekcbydjxiy","request_id":"mdpwesbhhibhd8ds79ibh5owoe","status":"200","method":"GET","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:71"}
{"timestamp":"2022-06-27 13:28:33.369 -03:00","level":"warn","msg":"Not authorized","caller":"app/plugin_api.go:943","plugin_id":"playbooks","request_id":"dt9hjizn1idjbkaszu1k58n6be","error":"does not have permissions\ngithub.com/mattermost/mattermost-plugin-playbooks/server/app.init\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/app/permissions_service.go:15\nruntime.doInit\n\truntime/proc.go:6222\nruntime.doInit\n\truntime/proc.go:6199\nruntime.doInit\n\truntime/proc.go:6199\nruntime.main\n\truntime/proc.go:233\nruntime.goexit\n\truntime/asm_arm64.s:1263\nuserID 5aedxdss3brg3p6bekcbydjxiy to access playbook\ngithub.com/mattermost/mattermost-plugin-playbooks/server/app.(*PermissionsService).PlaybookViewWithPlaybook\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/app/permissions_service.go:356\ngithub.com/mattermost/mattermost-plugin-playbooks/server/app.(*PermissionsService).PlaybookView\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/app/permissions_service.go:343\ngithub.com/mattermost/mattermost-plugin-playbooks/server/api.(*RootResolver).Playbook\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/api/graphql_root.go:26\nreflect.Value.call\n\treflect/value.go:556\nreflect.Value.Call\n\treflect/value.go:339\ngithub.com/graph-gophers/graphql-go/internal/exec.execFieldSelection.func2\n\tgithub.com/graph-gophers/[email protected]/internal/exec/exec.go:211\ngithub.com/graph-gophers/graphql-go/internal/exec.execFieldSelection\n\tgithub.com/graph-gophers/[email protected]/internal/exec/exec.go:231\ngithub.com/graph-gophers/graphql-go/internal/exec.(*Request).execSelections.func1\n\tgithub.com/graph-gophers/[email protected]/internal/exec/exec.go:81\nruntime.goexit\n\truntime/asm_arm64.s:1263","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/graphql_root.go:27"}
{"timestamp":"2022-06-27 13:28:33.369 -03:00","level":"error","msg":"Error executing request","caller":"app/plugin_api.go:940","plugin_id":"playbooks","request_id":"dt9hjizn1idjbkaszu1k58n6be","error":"graphql: Not authorized","operation":"Playbook","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/graphql.go:135"}
{"timestamp":"2022-06-27 13:28:33.369 -03:00","level":"warn","msg":"Not authorized","caller":"app/plugin_api.go:943","plugin_id":"playbooks","request_id":"jb5hwraufpdwufuu3smhr1zsyh","error":"does not have permissions\ngithub.com/mattermost/mattermost-plugin-playbooks/server/app.init\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/app/permissions_service.go:15\nruntime.doInit\n\truntime/proc.go:6222\nruntime.doInit\n\truntime/proc.go:6199\nruntime.doInit\n\truntime/proc.go:6199\nruntime.main\n\truntime/proc.go:233\nruntime.goexit\n\truntime/asm_arm64.s:1263\nuserID 5aedxdss3brg3p6bekcbydjxiy to access playbook\ngithub.com/mattermost/mattermost-plugin-playbooks/server/app.(*PermissionsService).PlaybookViewWithPlaybook\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/app/permissions_service.go:356\ngithub.com/mattermost/mattermost-plugin-playbooks/server/app.(*PermissionsService).PlaybookView\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/app/permissions_service.go:343\ngithub.com/mattermost/mattermost-plugin-playbooks/server/api.(*StatsHandler).playbookStats\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/api/stats.go:96\ngithub.com/mattermost/mattermost-plugin-playbooks/server/api.withLogger.func1\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:48\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\nmain.(*Plugin).getErrorCounterHandler.func1.1\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/plugin.go:386\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\ngithub.com/mattermost/mattermost-plugin-playbooks/server/api.MattermostAuthorizationRequired.func1\n\tgithub.com/m
attermost/mattermost-plugin-playbooks/server/api/api.go:103\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\ngithub.com/mattermost/mattermost-plugin-playbooks/server/api.LogRequest.func1\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:69\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\tgithub.com/gorilla/[email protected]/mux.go:210\ngithub.com/mattermost/mattermost-plugin-playbooks/server/api.(*Handler).ServeHTTP\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/api/api.go:48\nmain.(*Plugin).ServeHTTP\n\tgithub.com/mattermost/mattermost-plugin-playbooks/server/plugin.go:84\ngithub.com/mattermost/mattermost-server/v6/plugin.(*hooksRPCServer).ServeHTTP\n\tgithub.com/mattermost/mattermost-server/[email protected]/plugin/client_rpc.go:456\nreflect.Value.call\n\treflect/value.go:556\nreflect.Value.Call\n\treflect/value.go:339\nnet/rpc.(*service).call\n\tnet/rpc/server.go:381\nruntime.goexit\n\truntime/asm_arm64.s:1263","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/api.go:75"}
{"timestamp":"2022-06-27 13:28:33.369 -03:00","level":"debug","msg":"Handled HTTP request","caller":"app/plugin_api.go:934","plugin_id":"playbooks","method":"POST","url":"/api/v0/query","user_id":"5aedxdss3brg3p6bekcbydjxiy","request_id":"dt9hjizn1idjbkaszu1k58n6be","status":"200","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:71"}
{"timestamp":"2022-06-27 13:28:33.370 -03:00","level":"debug","msg":"Handled HTTP request","caller":"app/plugin_api.go:934","plugin_id":"playbooks","status":"403","user_id":"5aedxdss3brg3p6bekcbydjxiy","request_id":"jb5hwraufpdwufuu3smhr1zsyh","method":"GET","url":"/api/v0/stats/playbook?playbook_id=i9s3k1tarpywbef3qbsm8gibir","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:71"}

and now I can filter down to just one request_id to trace the messages in question:

cat logs/mattermost.log | jq -c 'select(.plugin_id=="playbooks") | select(.request_id=="dt9hjizn1idjbkaszu1k58n6be") | .msg'
"Received HTTP request"
"Not authorized"
"Error executing request"
"Handled HTTP request"

lieut-data avatar Jun 27 '22 15:06 lieut-data

Looks like using logrus also allows us to detect unsanitized user input making it's way into logs -- previously masked, perhaps? Or did just touching this code trigger the detection? Either way, I'll take a look at these :)

image

lieut-data avatar Jun 27 '22 16:06 lieut-data

One thing to note: right now, plugin_caller is accurate, but because of our handler semantics, often just points to a helper method, e.g. "plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/logger.go:67". I'd love to either find a way to notify the logger that this is a helper a such, or move these logs back closer to where they are actually invoked. (This isn't an issue where we don't abstract away logging.)

lieut-data avatar Jun 27 '22 16:06 lieut-data

Looking at my own example, we'll need to do something better for GraphQL in this case:

{"timestamp":"2022-06-27 13:28:33.369 -03:00","level":"error","msg":"Error executing request","caller":"app/plugin_api.go:940","plugin_id":"playbooks","request_id":"dt9hjizn1idjbkaszu1k58n6be","error":"graphql: Not authorized","operation":"Playbook","plugin_caller":"github.com/mattermost/mattermost-plugin-playbooks/server/api/graphql.go:135"}

lieut-data avatar Jun 27 '22 17:06 lieut-data

Added myself as a reviewer to this PR, since it contains multiple CodeQL alerts.

srkgupta avatar Jun 28 '22 06:06 srkgupta

Following up on the CodeQL issues at https://community.mattermost.com/core/pl/7zsxojoqpigif8odyuw5rdw41y.

lieut-data avatar Jun 28 '22 19:06 lieut-data

@srkgupta, any idea what's going on with the code scanner above?

lieut-data avatar Jun 30 '22 15:06 lieut-data

@lieut-data do you think we could separate in a fresh pr the fix of the log levels before we change the approach for logging?

trilopin avatar Aug 18 '22 07:08 trilopin

Thanks, @trilopin! I considered this, but thought it might be easier just to switch to the context model. For now, the underlying logger is still encoded within the http.Request context, but it's now accessed as per a *Context parameter passed to all APIs.

Hopefully this unblocks us :)

lieut-data avatar Aug 19 '22 16:08 lieut-data

(Note that I haven't switched us to mlog until that package can be used seamlessly across the RPC boundary like we can do for logrus.)

lieut-data avatar Aug 19 '22 16:08 lieut-data

@crspeller / @trilopin, I've fixed the build issues -- my upgrade of the plugin-api brought in the "new" EnsureBot and triggered https://community.mattermost.com/core/pl/9c8txjwkytdczc6kw8ngggyour. Fixed by making the OwnerId explicit for now.

@srkgupta, are the the security messages here just false positives again? What we can do to fix this from happening? I'm worried about all the future logging messages...

lieut-data avatar Aug 19 '22 17:08 lieut-data

@srkgupta, are the the security messages here just false positives again? What we can do to fix this from happening? I'm worried about all the future logging messages...

Hi @lieut-data

My apologies. Somehow this missed my attention before. I have taken a look at the new alerts and have dismissed them as Won't Fix (Not Affected). Unfortunately we need to manually review these alerts and ensure that we aren't accidentally leaking sensitive credentials in the logs (or) directly logging user controlled inputs without proper sanitisation.

I have added a follow-up task for this and will pick this up and check how we can reduce such false positives.

Thanks

srkgupta avatar Sep 06 '22 06:09 srkgupta

Thanks, @srkgupta! What's weird is that we've been logging all along without these warnings showing up in code reviews, and now that we're using logrus, CodeQL's automated checks seem to "discover" these logs. That makes me think all our existing logging in other applications is probably just getting ignored. Would we consider disabling these checks for now until they can catch all types of logs consistently? Otherwise, CodeQL may become very noisy in all subsequent PRs for Playbooks after this PR is merged.

lieut-data avatar Sep 06 '22 13:09 lieut-data

@lieut-data these alerts are triggered because of the new related changes done in the PR and the alerts are only triggered once. Once these alerts are analyzed and dismissed, they will not be re-triggered again in the subsequent PRs unless there are additional similar code changes. This is applicable for other repos too. We had already analysed all the previous alerts and have taken appropriate actions.

srkgupta avatar Sep 06 '22 13:09 srkgupta

Once these alerts are analyzed and dismissed, they will not be re-triggered again in the subsequent PRs unless there are additional similar code changes.

Thanks, @srkgupta! So, just to confirm, will new log messages trigger alerts in new PRs?

lieut-data avatar Sep 06 '22 13:09 lieut-data

will new log messages trigger alerts in new PRs?

As long as they are some how directly logging a user controlled input as is, yes they will trigger new alert.

srkgupta avatar Sep 06 '22 13:09 srkgupta

Thanks, @srkgupta! To explore this further, I created https://github.com/mattermost/mattermost-plugin-playbooks/pull/1465 using the existing logging infrastructure and referencing a user-controlled input that triggered a warning in the same way as this PR: https://github.com/mattermost/mattermost-plugin-playbooks/pull/1271/files#diff-6e3ba3ce59f140effc0a4e689feb56b2d7eb2f14529117cc7e20474962095028R296, but CodeQL did not generate a warning there.

lieut-data avatar Sep 06 '22 14:09 lieut-data

@gbochora, I'd love to get this merged after we cut v7.3 final. Would you be able to take a look at the approach here? It's already been partially reviewed, but we just want that extra confidence to avoid churn later on :)

lieut-data avatar Sep 06 '22 14:09 lieut-data

Thanks, @srkgupta! To explore this further, I created #1465 using the existing logging infrastructure and referencing a user-controlled input that triggered a warning in the same way as this PR: https://github.com/mattermost/mattermost-plugin-playbooks/pull/1271/files#diff-6e3ba3ce59f140effc0a4e689feb56b2d7eb2f14529117cc7e20474962095028R296, but CodeQL did not generate a warning there.

Got it. Thanks for the explanation @lieut-data. I might have to dig deep dive into the CodeQL tool to understand when & how such alerts are triggered. I will take this up along with the task and investigate this in detail. Meanwhile this PR can be merged as there are no open CodeQL alerts. Thanks.

srkgupta avatar Sep 07 '22 06:09 srkgupta

@gbochora, I'd love to get this merged after we cut v7.3 final. Would you be able to take a look at the approach here? It's already been partially reviewed, but we just want that extra confidence to avoid churn later on :)

Sure, I'm in the process of review and will finish today.

gbochora avatar Sep 08 '22 14:09 gbochora