hydra
hydra copied to clipboard
Persister.transaction does not rollback on panic
Preflight checklist
- [X] I could not find a solution in the existing issues, docs, nor discussions.
- [X] I agree to follow this project's Code of Conduct.
- [X] I have read and am following this repository's Contribution Guidelines.
- [ ] This issue affects my Ory Cloud project.
- [X] I have joined the Ory Community Slack.
- [ ] I am signed up to the Ory Security Patch Newsletter.
Describe the bug
I am not sure if this is by design, but when looking at Hydra's code, Persister.transaction does not handle panic and does not call c.TX.Rollback on it. I think it would be cleaner if it would be handled here instead of potentially never handling it because panic got handled somewhere else and never killed the process/context.
Reproducing the bug
This is a design issue.
Relevant log output
No response
Relevant configuration
No response
Version
observed on current master branch
On which operating system are you observing this issue?
No response
In which environment are you deploying?
No response
Additional Context
See here an example of a similar function in Gorm which does similar handling, but handles panic as well.
True, it would be way cleaner to do rollback on panic. Anyone working on this, please make the changes in https://github.com/ory/x/blob/master/popx/transaction.go and then use that function in Hydra instead of the current transaction implementation.
I am not familiar with popx, but I would suggest that this function here is changed to:
func (p *Persister) transaction(ctx context.Context, f func(ctx context.Context, c *pop.Connection) error) (err error) {
isNested := true
panicked := true
c, ok := ctx.Value(transactionContextKey).(*pop.Connection)
if !ok {
isNested = false
c, err = p.conn.WithContext(ctx).NewTransaction()
if err != nil {
return errorsx.WithStack(err)
}
defer func() {
if panicked || err != nil {
if !isNested {
// It would be better if there was RollbackUnlessCommitted.
rollbackErr := errorsx.WithStack(c.TX.Rollback())
if rollbackErr != nil {
err = rollbackErr
}
}
}
}()
}
err = f(context.WithValue(ctx, transactionContextKey, c), c)
if err == nil {
// commit if there is no wrapping transaction
if !isNested {
err = errorsx.WithStack(c.TX.Commit())
}
}
return
}
Does Ory Hydra actually have a panic? If not, I'd close this as a dupe of https://github.com/ory/fosite/issues/637 and also my comment https://github.com/ory/fosite/issues/637#issuecomment-973838568
Are you saying that in no library Hydra is calling, they ever use panic or no code there will ever panic?
Moreover, Go panics also when there are other issues, like nil pointer dereferencing or interface assertion mismatch.
Are you saying that in no library Hydra is calling, they ever use panic or no code there will ever panic?
Hydra calls into pop, which calls into sglx, which has many uses of panic:
https://github.com/jmoiron/sqlx/search?q=panic
What I meant was if you actually have observed a panic or if this is a hypothetical fix. Please also do keep in mind that if we begin a transaction, the transaction is not "commited" until you actually do "commit". So a panic will automagically rollback the transaction if the connection is closed (on panic). So I don't really understand what bug case this would solve? Or did I miss a bug here (which is why I asked)?
I have observed panics while developing our own storage backend and when I had bugs. Hopefully I have ironed them out, but I also observed that it didn't cleanup things well. So I worry that if I have any more bug around, things might be problematic in production.
So a panic will automagically rollback the transaction if the connection is closed (on panic).
Unless you are using connection pooling, then transaction might be kept around and the next query hitting the same connection might be in for a surprise.
Unless you are using connection pooling, then transaction might be kept around and the next query hitting the same connection might be in for a surprise.
Oh now I see the issue, @zepatrik also made me aware that the problem is connection pooling in combination with panic recovery in the middleware, in combination with a panic in the storage adapter. The good news is, in Ory Hydra we do not have panic recovery because we actually want the system to crash in order to have a high impact issue which must be resolved, as panics are always very bad due to lack of controlled execution flow! Still possible to address this in Fosite though. As I said already though, the refactor will be substantial and it only covers very rare edge cases, so it's a risk / reward decision that has to be made.
The good news is, in Ory Hydra we do not have panic recovery because we actually want the system to crash in order to have a high impact issue which must be resolved, as panics are always very bad due to lack of controlled execution flow!
Oh, I am not saying that you should handle the panic, only that you should call rollback in defer and then keep the panic propagating. I understand that killing the process probably addresses the connection pooling issue (although not sure what happens if somebody is using pg_bouncer, I hope it cleans up the transaction automatically when connection dies), but I think that it could help with storage backends which are not transactional (like mongo) but one wants to take advantage of transaction API to manually cleanup things. But that is pure speculation.
At least for Hydra, the change necessary to call rollback on panic is relatively small, I posted it above. Not sure why we would not do that. For Fosite, I can look into that a bit and see how much work it really is.
Ok, sounds good :)
Done for Fosite in https://github.com/ory/fosite/pull/638.
So I made this for Fosite, but for Hydra the change still has to be done, see: https://github.com/ory/hydra/issues/2857#issuecomment-969435112
thanks, reopening!