mattermost-plugin-gitlab
mattermost-plugin-gitlab copied to clipboard
[GH-261] Handle revoked token
Summary
If a GitLab API call fails with 401 {error: invalid_token}, disconnect the user and DM the user a message telling how to reconnect their account.
Ticket Link
Fixes https://github.com/mattermost/mattermost-plugin-gitlab/issues/261
Questions:
-
I have not handled the case where the token is both expired and revoked. Here,
p.checkAndRefreshTokenattempts to refresh the token and the call tosrc.Tokenfails. However, if I handle the error and callp.disconnectGitlabAccount, it will callp.checkAndRefreshTokenagain, creating an infnite loop. How would you suggest to resolve this? -
In the case where the error occurs during the execution of a slash command, I have not changed the existing slash command error responses. I just send the DM in addition. So if you get the error during
/gitlab me, you'll get the new message in the DM channel but the command response will still be "Encountered an error getting your GitLab profile.". Should I change the command response to a similar or identical message to what I'm sending in the DM? -
What test coverage is needed?
Hello @MatthewDorner,
Thanks for your pull request! A Core Committer will review your pull request soon. For code contributions, you can learn more about the review process here.
Codecov Report
Patch coverage: 21.77% and project coverage change: -1.18 :warning:
Comparison is base (
de004a9) 32.20% compared to head (865c702) 31.03%.
Additional details and impacted files
@@ Coverage Diff @@
## master #308 +/- ##
==========================================
- Coverage 32.20% 31.03% -1.18%
==========================================
Files 21 21
Lines 3465 3757 +292
==========================================
+ Hits 1116 1166 +50
- Misses 2242 2471 +229
- Partials 107 120 +13
| Impacted Files | Coverage Δ | |
|---|---|---|
| server/api.go | 21.17% <0.00%> (-1.83%) |
:arrow_down: |
| server/flow.go | 0.00% <0.00%> (ø) |
|
| server/telemetry.go | 0.00% <0.00%> (ø) |
|
| server/plugin.go | 15.22% <18.35%> (-0.48%) |
:arrow_down: |
| server/command.go | 24.72% <40.90%> (+1.11%) |
:arrow_up: |
| server/webhook.go | 40.30% <42.10%> (+0.18%) |
:arrow_up: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
This PR has been automatically labelled "stale" because it hasn't had recent activity. A core team member will check in on the status of the PR to help with questions. Thank you for your contribution!
- I have not handled the case where the token is both expired and revoked. Here,
p.checkAndRefreshTokenattempts to refresh the token and the call tosrc.Tokenfails. However, if I handle the error and callp.disconnectGitlabAccount, it will callp.checkAndRefreshTokenagain, creating an infnite loop. How would you suggest to resolve this?
What is the side effect of "not handling the case where the token is both expired and revoked"? Note that the plugin's API is called multiple times when the frontend requests data to fill the left hand side GitLab metrics, so it will always be tried more than once during this time. You can refresh your browser to trigger this data refresh.
There may be the case where we end up sending more than one bot DM here, due to more than one request trying to use the access token before it is removed from the kv store. If the server is running in a high availability environment, there may be some lag on the database read for rejecting the token, which could cause this scenario.
I'm thinking that sending multiple DMs is an ok outcome, as it's relatively harmless, and any defense around this is going to be complicated and potentially error-prone.
Also, we will want to make sure the changes in this PR work with the changes from https://github.com/mattermost/mattermost-plugin-gitlab/pull/298
- In the case where the error occurs during the execution of a slash command, I have not changed the existing slash command error responses. I just send the DM in addition. So if you get the error during
/gitlab me, you'll get the new message in the DM channel but the command response will still be "Encountered an error getting your GitLab profile.". Should I change the command response to a similar or identical message to what I'm sending in the DM?
What you describe as the current behavior of this PR seems good to me :+1:
- What test coverage is needed?
I would test the handleGitlabError function to assert that it handles the error properly
What is the side effect of "not handling the case where the token is both expired and revoked"? Note that the plugin's API is called multiple times when the frontend requests data to fill the left hand side GitLab metrics, so it will always be tried more than once during this time. You can refresh your browser to trigger this data refresh.
Each API call will get to checkAndRefreshToken, see that the token needs to be refreshed, but when it tries to refresh, the call to src.Token() will fail because the token is revoked and can't be refreshed. So each request will fail. This is the current behavior in this PR if the token is both expired and revoked. And I'm not sure what exact circumstances a user would revoke a token, but it seems like it might coincide with an expired token (they stop using the plugin for a while or something) so that this can be a realistic scenario.
Also, we will want to make sure the changes in this PR work with the changes from https://github.com/mattermost/mattermost-plugin-gitlab/pull/298
Yeah, I waited for that PR to be merged and this PR is based on code that includes those changes. But the issue with the "revoked and expired" case is how to make this work with the checkAndRefreshToken method from that PR. To clarify:
- currently in this PR, the case is not handled which means all requests will fail if the token is revoked and expired
- if I try to handle the error from the
src.Token()call withincheckAndRefreshTokenthe same way I have elsewhere, it will cause an infinite loop in the code due to the reasons described in my opening post
There is probably a workaround I'm just not seeing?
What is the side effect of "not handling the case where the token is both expired and revoked"? Note that the plugin's API is called multiple times when the frontend requests data to fill the left hand side GitLab metrics, so it will always be tried more than once during this time. You can refresh your browser to trigger this data refresh.
Each API call will get to
checkAndRefreshToken, see that the token needs to be refreshed, but when it tries to refresh, the call tosrc.Token()will fail because the token is revoked and can't be refreshed. So each request will fail. This is the current behavior in this PR if the token is both expired and revoked. And I'm not sure what exact circumstances a user would revoke a token, but it seems like it might coincide with an expired token (they stop using the plugin for a while or something) so that this can be a realistic scenario.
@MatthewDorner Is there any risk with this situation? If it's revoked & expired, and this happens, is the kvstore then in an invalid state? It seems like a non-issue to me if we invalidate the token on our side so we don't try to use it again.
Also, we will want to make sure the changes in this PR work with the changes from #298
Yeah, I waited for that PR to be merged and this PR is based on code that includes those changes. But the issue with the "revoked and expired" case is how to make this work with the
checkAndRefreshTokenmethod from that PR. To clarify:
- currently in this PR, the case is not handled which means all requests will fail if the token is revoked and expired
- if I try to handle the error from the
src.Token()call withincheckAndRefreshTokenthe same way I have elsewhere, it will cause an infinite loop in the code due to the reasons described in my opening postThere is probably a workaround I'm just not seeing?
"This case is not handled"
There's something I'm missing here. I don't understand how the case is not handled. I understand that the token won't work in this case. Are we getting the "invalid token" error from GitLab in this case? Shouldn't we just delete it from our database and tell the user about it like usual? Is there something stopping us from doing this? Can we extract some code from the function causing the infinite loop to call in another context?
Also, please mention me in replies here so I get a notification for the mention :+1:
@mickmister In checkAndRefreshToken, we just check the expiration time of the token from the KV store to determine if it needs to be refreshed. If a token is invalid, we only know when we make a request and it returns the error from GitLab. So in order to reproduce this condition:
- shut down Mattermost server
- wait 2 hours, or modify the
expiryfield in the KV store_gitlabtokenrecord - on the GitLab side, revoke the token
- start server again
From my perspective, I feel like the issue is that checkAndRefreshToken from https://github.com/mattermost/mattermost-plugin-gitlab/pull/298 is called from getGitlabUserInfoByMattermostID, which ensures it's called before every API call, but isn't really part of the purpose of that function. And it means checkAndRefreshToken can't then use those methods such as in my case p.disconnectGitlabAccount without causing a loop.
I can think about it more, just wanted to see if something obvious occurred.
Are we getting the "invalid token" error from GitLab in this case? Shouldn't we just delete it from our database and tell the user about it like usual?
@mickmister We never get to the actual API call, as the call to src.Token() (calls GitLab to refresh the token since it's expired) in checkAndRefreshToken will fail and that error will be returned. If I add code in checkAndRefreshToken to check for that error and eventually call p.disconnectGitlabAccount, that will call getGitlabUserInfoByMattermostID which will call checkAndRefreshToken, which will again try to refresh the token and on and on. Sorry if it's confusing.
Are we getting the "invalid token" error from GitLab in this case? Shouldn't we just delete it from our database and tell the user about it like usual?
@mickmister We never get to the actual API call, as the call to
src.Token()(calls GitLab to refresh the token since it's expired) incheckAndRefreshTokenwill fail and that error will be returned. If I add code incheckAndRefreshTokento check for that error and eventually callp.disconnectGitlabAccount, that will callgetGitlabUserInfoByMattermostIDwhich will callcheckAndRefreshToken, which will again try to refresh the token and on and on. Sorry if it's confusing.
@MatthewDorner Yeah it would seem to me that getGitlabUserInfoByMattermostID should be a local operation, and not a "contact GitLab" operation. getGitlabUserInfoByMattermostID should not call checkAndRefreshToken, and any function calling getGitlabUserInfoByMattermostID that also cares about refreshing the token should explicitly call checkAndRefreshToken after the call getGitlabUserInfoByMattermostID has returned. Does this solve the issue?
@mickmister yes, that is one of the solutions that occurred to me. Basically, check for both conditions (expired and revoked) in the same place.
I didn't want to go that route immediately since it would be a larger scope of changes than I did so far, so I will work on it a bit and update. Thanks for the input.
@mickmister Think it would work to use a wrapper or decorator around the GitLab client calls? Otherwise, every method that makes GitLab calls must call checkAndRefreshToken in its setup stage, and then later check for the invalid token after any call it makes, or at least after the first call it will definitely make, even if it's only to call handleGitlabError(err error).
A wrapper function should be cleaner and use less code, but I assume it would hit the KV store for every GitLab call rather than only once per method that makes GitLab calls, but that may be negligible. And may be difficult in other ways I don't foresee.
@MatthewDorner Could we have a wrapper function around the http handler functions?
https://github.com/mattermost/mattermost-plugin-gitlab/blob/c5d107ec3962fd8ba53596d595c703922bd6f3e3/server/api.go#L52
We can rename checkAuth to checkMattermostAuth, then make a similar checkGitlabAuth function for certain routes. Note that since checkGitlabAuth would be a more expensive operation, we want to make that be the outer call when wrapping.
@mickmister Think it would work to use a wrapper or decorator around the GitLab client calls? Otherwise, every method that makes GitLab calls must call
checkAndRefreshTokenin its setup stage, and then later check for the invalid token after any call it makes, or at least after the first call it will definitely make, even if it's only to callhandleGitlabError(err error).A wrapper function should be cleaner and use less code, but I assume it would hit the KV store for every GitLab call rather than only once per method that makes GitLab calls, but that may be negligible. And may be difficult in other ways I don't foresee.
I'm not sure how this wrapper function would work, since each of the Gitlab client's methods return different types. To me, handleGitlabError is the wrapper function. We would probably need to abstract out the Gitlab client to include this wrapper behavior.
If the solution in my previous comment won't work, I'm thinking we should just go with the strategy designed in your first paragraph. Explicit error handling for each call to the external library Gitlab client makes sense to me.
@mickmister unless you have a strong preference, I think I like locating the GitLab token error handling closer to the GitLab client calls, and to ensure it will be called in more circumstances such as slash commands, even if that isn't strictly necessary due to the frontend constantly making those requests. Also I already started working on that version haha.
I'm thinking that sending multiple DMs is an ok outcome, as it's relatively harmless, and any defense around this is going to be complicated and potentially error-prone.
@mickmister yes, this will be the case when the revoked token is detected due to the 4 API requests from the frontend. It also spams the error log a bit with messages such as can't get GitLab user info from mattermost id because the user is already disconnected and it's still trying to process the other requests.
@mickmister Converting to draft because still having issues. The issue is related to the conversation https://github.com/mattermost/mattermost-plugin-gitlab/pull/298#discussion_r886170548 and case where the 4 API requests happen simultaneously, as well as the previous conversation in this PR about token being both expired and revoked.
While it was OK to have multiple token refreshes in the PR linked above, it is not OK once we are checking for and disconnecting the user based on an invalid token. What happens is one of the 4 requests will refresh the token, and the old token is now invalid. Then the 3 other goroutines try to refresh their (now invalid) tokens and (with the changes in this PR) the user gets disconnected. I'm not sure how the exact timing or concurrency works out, but this condition happens every time for me when I set my token to expire and then refresh the browser.
This is why it's a problem if getGitlabUserInfoByMattermostId is a local operation (and without locking anything): one goroutine can load the token from KV while another has already refreshed it and is about to store it, which means the first goroutine has loaded an invalid token.
I suspect this is also the reason for a user in the community server complaining of their error logs being spammed even after the token refresh fix, which resulted in this issue https://github.com/mattermost/mattermost-plugin-gitlab/issues/313. The first request would refresh the token and the other 3 would fail and write to the error log, every 2 hours. It just wouldn't break any functionality, but if we also disconnect the user every time we detect an invalid token it becomes a bigger issue, and maybe some concurrency control is required now.
The issue is made more difficult by the fact that GitLab's OAuth implementation does not allow a refresh_token to be used multiple times. It issues a new one upon each refresh. I guess the OAuth specification allows it to be done either way.
https://docs.gitlab.com/ee/api/oauth2.html
The Golang oauth2 library apparently auto-refreshes the token, but doesn't give you the new token back, so the auto-refresh doesn't work in cases like this where refresh_token is invalidated in this way and the token needs to be persistent. (and would be subject to the same concurrency issues I suppose). Anyway by detecting the need to refresh the token ahead of time we're avoiding that auto-refresh behavior.
Relevant thread here, although it's confusing because they're talking about a few different issues / cases at the same time: https://github.com/golang/oauth2/issues/84
This PR has been automatically labelled "stale" because it hasn't had recent activity. A core team member will check in on the status of the PR to help with questions. Thank you for your contribution!
The Golang
oauth2library apparently auto-refreshes the token, but doesn't give you the new token back
Doesn't the current solution for auto-refreshing the token in this project rely on us receiving a new version of the token from the library?
@mickmister We are refreshing the token manually. That auto refresh is a separate feature that's discussed in the thread I linked but I don't think it works here since the GitLab Oauth provider invalidates the refresh token, thus why we had to make the other recent token fix.
@MatthewDorner Sorry it's difficult for me to grasp based on our discussion here. What are the remaining items to resolve here? This PR adds a lot of value to the plugin, so I would like to see this merged 🙂
If the oauth2 library doesn't "give you back" the token when it's refreshed, does it at least mutate the existing one so it's updated?
To solve the issue of 4 requests being done simultaneously, maybe the UI can issue a single request first for this purpose, and then follow up with the 4 requests to fetch data afterwards. But we also have the issue of having multiple tabs open simultaneously, including the desktop app which always has 3 tabs open due to the different products supported now (Playbooks and Boards).
If it's an absolute requirement that we refresh only once per user, we can also use the cluster.Mutex function exposed here. For the mutex's key, we can use userID + "-refresh-token". This specific mutex is necessary to coordinate between different instances of the plugin running on separate nodes of a high availability cluster.
If we make it so the frontend always sends a request to a new plugin endpoint /token-refresh before fetching data, we can use the mutex to ensure only one instance of the plugin is checking the token at a given time. This allows us to make sure the token is only refreshed one time. @MatthewDorner What do you think of this?
cc @DHaussermann for visibility
This PR has been automatically labelled "stale" because it hasn't had recent activity. A core team member will check in on the status of the PR to help with questions. Thank you for your contribution!
@mickmister @DHaussermann
Using cluster.Mutex fixed the concurrency issue. As a heads up, I found a lot of instances of other people having this problem with OAuth providers that implement a strategy called "refresh token rotation" which seems to have gotten popular recently. Here is another example of the issue: https://github.com/dexidp/dex/pull/2352. So, this may appear in other plugins.
As for the potential /token-refresh endpoint, it could clean up the code so we don't have to check for expired token in each individual API handler on the backend, but the mutex by itself is what solves the concurrency issue. There are also downsides to adding the endpoint, as it adds responsibility onto the frontend and the API, so I'd like to consider and get input on what could be the best solution. Functionally, I think the existing solution suffices.
Also I see the test is failing. I think it's just something in the test code that needs to be corrected for my changes, looking into it now.
There are also downsides to adding the endpoint, as it adds responsibility onto the frontend and the API
I think we're good with the mutex on the backend, no need to add the complexity involving the frontend.
At the moment, it seems the backend is a bit bloated with the code duplication from the PR. Can we move the code to a middleware function, similar to p.checkAuth? Then we can apply the new middleware to the appropriate routes here https://github.com/mattermost/mattermost-plugin-gitlab/blob/2c3003baec9efac58d02299cd1ee2d29a7c2c08c/server/api.go#L52
Also I see the test is failing. I think it's just something in the test code that needs to be corrected for my changes, looking into it now.
It looks like the api.go file needs to be formatted with gofmt. My editor does this automatically whenever I save a file. That's strange if that's not the case for your editor
golangci-lint run ./...
server/api.go:112: File is not `gofmt`-ed with `-s` (gofmt)
mutex, err := cluster.NewMutex(p.API, context.UserID + "-refresh-token")
make: *** [Makefile:45: check-style] Error 1
At the moment, it seems the backend is a bit bloated with the code duplication from the PR. Can we move the code to a middleware function, similar to
p.checkAuth? Then we can apply the new middleware to the appropriate routes here
@mickmister OK, added checkToken
@mickmister
- Should we also have the mutex for the
/connectedendpoint? It doesn't useattachUserContextwhich is where I put the mutex code. But, it sometimes callsGetToDowhich calls GitLab, so it could need the mutex but I'd be adding it to the api handler itself. This seems to be an issue where/connectedis serving two separate functions. - For the '/todo' endpoint, I didn't use the
checkTokenwrapper there since I already put a call top.checkAndRefreshTokeninGetToDosince at that time I was trying to put the token checks as close as possible to the GitLab client calls. I could instead refactor so all the calls top.checkAndRefreshToken(there are 3 inserver/plugin.go) are moved out of theserver/plugin.gomethods and instead are placed inapi.goandcommand.go. Is there a preference?