hydra icon indicating copy to clipboard operation
hydra copied to clipboard

Persister.transaction does not rollback on panic

Open mitar opened this issue 4 years ago • 13 comments

Preflight checklist

Describe the bug

I am not sure if this is by design, but when looking at Hydra's code, Persister.transaction does not handle panic and does not call c.TX.Rollback on it. I think it would be cleaner if it would be handled here instead of potentially never handling it because panic got handled somewhere else and never killed the process/context.

Reproducing the bug

This is a design issue.

Relevant log output

No response

Relevant configuration

No response

Version

observed on current master branch

On which operating system are you observing this issue?

No response

In which environment are you deploying?

No response

Additional Context

See here an example of a similar function in Gorm which does similar handling, but handles panic as well.

mitar avatar Nov 15 '21 12:11 mitar

True, it would be way cleaner to do rollback on panic. Anyone working on this, please make the changes in https://github.com/ory/x/blob/master/popx/transaction.go and then use that function in Hydra instead of the current transaction implementation.

zepatrik avatar Nov 15 '21 13:11 zepatrik

I am not familiar with popx, but I would suggest that this function here is changed to:

func (p *Persister) transaction(ctx context.Context, f func(ctx context.Context, c *pop.Connection) error) (err error) {
	isNested := true
        panicked := true

	c, ok := ctx.Value(transactionContextKey).(*pop.Connection)
	if !ok {
		isNested = false

		c, err = p.conn.WithContext(ctx).NewTransaction()
		if err != nil {
			return errorsx.WithStack(err)
		}
		defer func() {
			if panicked || err != nil {
				if !isNested {
					// It would be better if there was RollbackUnlessCommitted.
					rollbackErr := errorsx.WithStack(c.TX.Rollback())
					if rollbackErr != nil {
						err = rollbackErr
					}
				}
			}
		}()
	}

	err = f(context.WithValue(ctx, transactionContextKey, c), c)
	if err == nil {
		// commit if there is no wrapping transaction
		if !isNested {
			err = errorsx.WithStack(c.TX.Commit())
		}
        }

	return
}

mitar avatar Nov 15 '21 23:11 mitar

Does Ory Hydra actually have a panic? If not, I'd close this as a dupe of https://github.com/ory/fosite/issues/637 and also my comment https://github.com/ory/fosite/issues/637#issuecomment-973838568

aeneasr avatar Nov 22 '21 09:11 aeneasr

Are you saying that in no library Hydra is calling, they ever use panic or no code there will ever panic?

Moreover, Go panics also when there are other issues, like nil pointer dereferencing or interface assertion mismatch.

mitar avatar Nov 22 '21 10:11 mitar

Are you saying that in no library Hydra is calling, they ever use panic or no code there will ever panic?

Hydra calls into pop, which calls into sglx, which has many uses of panic:

https://github.com/jmoiron/sqlx/search?q=panic

mitar avatar Nov 22 '21 10:11 mitar

What I meant was if you actually have observed a panic or if this is a hypothetical fix. Please also do keep in mind that if we begin a transaction, the transaction is not "commited" until you actually do "commit". So a panic will automagically rollback the transaction if the connection is closed (on panic). So I don't really understand what bug case this would solve? Or did I miss a bug here (which is why I asked)?

aeneasr avatar Nov 22 '21 10:11 aeneasr

I have observed panics while developing our own storage backend and when I had bugs. Hopefully I have ironed them out, but I also observed that it didn't cleanup things well. So I worry that if I have any more bug around, things might be problematic in production.

So a panic will automagically rollback the transaction if the connection is closed (on panic).

Unless you are using connection pooling, then transaction might be kept around and the next query hitting the same connection might be in for a surprise.

mitar avatar Nov 22 '21 14:11 mitar

Unless you are using connection pooling, then transaction might be kept around and the next query hitting the same connection might be in for a surprise.

Oh now I see the issue, @zepatrik also made me aware that the problem is connection pooling in combination with panic recovery in the middleware, in combination with a panic in the storage adapter. The good news is, in Ory Hydra we do not have panic recovery because we actually want the system to crash in order to have a high impact issue which must be resolved, as panics are always very bad due to lack of controlled execution flow! Still possible to address this in Fosite though. As I said already though, the refactor will be substantial and it only covers very rare edge cases, so it's a risk / reward decision that has to be made.

aeneasr avatar Nov 22 '21 15:11 aeneasr

The good news is, in Ory Hydra we do not have panic recovery because we actually want the system to crash in order to have a high impact issue which must be resolved, as panics are always very bad due to lack of controlled execution flow!

Oh, I am not saying that you should handle the panic, only that you should call rollback in defer and then keep the panic propagating. I understand that killing the process probably addresses the connection pooling issue (although not sure what happens if somebody is using pg_bouncer, I hope it cleans up the transaction automatically when connection dies), but I think that it could help with storage backends which are not transactional (like mongo) but one wants to take advantage of transaction API to manually cleanup things. But that is pure speculation.

At least for Hydra, the change necessary to call rollback on panic is relatively small, I posted it above. Not sure why we would not do that. For Fosite, I can look into that a bit and see how much work it really is.

mitar avatar Nov 22 '21 22:11 mitar

Ok, sounds good :)

aeneasr avatar Nov 23 '21 08:11 aeneasr

Done for Fosite in https://github.com/ory/fosite/pull/638.

mitar avatar Nov 24 '21 23:11 mitar

So I made this for Fosite, but for Hydra the change still has to be done, see: https://github.com/ory/hydra/issues/2857#issuecomment-969435112

mitar avatar Jun 19 '23 11:06 mitar

thanks, reopening!

kmherrmann avatar Jun 19 '23 11:06 kmherrmann