hydra icon indicating copy to clipboard operation
hydra copied to clipboard

Janitor does not clean up the sessions

Open mohsen3 opened this issue 3 years ago • 13 comments

I noticed that janitor $DSN --requests cleans up hydra_oauth2_authentication_request and hydra_oauth2_consent_request tables, but not hydra_oauth2_authentication_session. That latter one is the second largest table in our database right now. It does not seem to be a reason to keep those rows around.

Describe the solution you'd like

Remove the rows from hydra_oauth2_authentication_session that are no longer needed.

  • We should not delete a row if there is a foreign key to it from either hydra_oauth2_authentication_request or hydra_oauth2_consent_request
  • We should not delete a row if the session has not yet expired

Additional context

The two queries to clean up hydra_oauth2_authentication_request and hydra_oauth2_consent_request tables are placed here. It seems fine to me to follow them by a new query to clean up the sessions as well:

	var ls consent.LoginSession
	err = p.Connection(ctx).RawQuery(fmt.Sprintf(`
		DELETE
		FROM %[1]s
		WHERE NOT EXISTS
			(
			SELECT NULL
			FROM %[2]s
			WHERE %[2]s.login_session_id = %[1]s.id
			)
		AND NOT EXISTS
			(
			SELECT NULL
			FROM %[3]s
			WHERE %[3]s.login_session_id = %[1]s.id
			)
		AND authenticated_at < ?
		AND authenticated_at < ?
		`,
		(&ls).TableName(),
		(&lr).TableName(),
		(&cr).TableName()),
		time.Now().Add(-p.config.ConsentRequestMaxAge()),
		notAfter).Exec()

I think time.Now().Add(-p.config.ConsentRequestMaxAge()) may not be the right constraint for sessions since they may be useable beyond the lifespan of consent challenges (or am I wrong?) but the rest should be fine.

mohsen3 avatar Jun 04 '21 16:06 mohsen3

It seems to me that the values in hydra_oauth2_authentication_session are used for two different reasons:

  1. When doing a logout, this is use as id_token hint. It doesn't seem to be a big deal if hydra get's a request for an already deleted row here since the user is already technically logged out.
  2. When a user has rememberFor option set, this table is looked up to see if there is an active session with the corresponding session_id stored in cookie. If the session is not in the table, the user is required to reauthenticate.

The problem I see here is that rememberFor is not kept in the database. Instead, it's set as the maxAge of the cookie. Is there a way to safely delete the rows then? Do we keep the rememberFor it in another table?

mohsen3 avatar Jun 07 '21 18:06 mohsen3

True, we don't keep that around. I think we should add that field to the table!

Does the table have any foreign keys on e.g. login or consent requests?

aeneasr avatar Jun 08 '21 07:06 aeneasr

Both hydra_oauth2_authentication_request and hydra_oauth2_consent_request have a login_session_id pointing to the session table.

image

mohsen3 avatar Jun 08 '21 15:06 mohsen3

It seems that remember_for is kept in the hydra_oauth2_authentication_request_handled table

image

mohsen3 avatar Jun 08 '21 15:06 mohsen3

Ok, I don't remember if the constraint is cascade delete or cascade set null. If it is the latter, this table would be safe for purging old records

aeneasr avatar Jun 08 '21 16:06 aeneasr

It seems that it cascades deletes:

CONSTRAINT `hydra_oauth2_authentication_request_login_session_id_fk` 
FOREIGN KEY (`login_session_id`) 
REFERENCES `hydra_oauth2_authentication_session` (`id`) 
ON DELETE CASCADE

mohsen3 avatar Jun 08 '21 17:06 mohsen3

I think we can delete the sessions with ids returned from the following query before we cleanup the other two tables (requests and consents). Otherwise, we loose the connection between the hydra_oauth2_authentication_session and hydra_oauth2_authentication_request_handled that holds the remember_for. This triggers a cascade delete that removes rows from hydra_oauth2_authentication_request as well, which seems fine to me.

select s.id
from hydra_oauth2_authentication_session s, 
     hydra_oauth2_authentication_request r, 
     hydra_oauth2_authentication_request_handled h
where s.id = r.login_session_id
and r.challenge = h.challenge
and h.remember = 1
and now() - h.requested_at > h.remember_for;

mohsen3 avatar Jun 10 '21 19:06 mohsen3

The problem with the proposed query is that it would delete login session, which deletes authentication session, which deletes (I think) consent session, which deletes access tokens. So while the login session might be expired, it could still bel inked to an access token. This connection must be kept alive for the OIDC Front/Back-channel logouts to work.

Thus, I think that we can only safely delete all sessions that do not have foreign key references. If they do have foreign key references, it means they are still used.

Foreign keys references should be removed by the current janitor already.

What we would need however would be a expires_at timestamp in the login_session. Then we do basically delete all login sessions with expires_at < now() AND NOT EXISTS hydra_oauth2_authentication_request ...

aeneasr avatar Jun 11 '21 08:06 aeneasr

Deletes from hydra_oauth2_authentication_session are cascaded to hydra_oauth2_authentication_request, but not to hydra_oauth2_consent_request. So, tokens and other things dependent on consent should not be affected.

But you are right, sessions are not useful without hydra_oauth2_authentication_request[_handled]. So, it seems fine to delete them.

I created a draft PR. Please have a look and let me know what you think about it. I can add more tests if it is needed.

mohsen3 avatar Jun 14 '21 23:06 mohsen3

One issue with the current implementation is this line:

https://github.com/ory/hydra/blob/0dda22d8106027c12833d04ec019ff10cfd9ab29/persistence/sql/persister_consent.go#L490

It should probably use remember_for rather than ConsentRequestMaxAge().

mohsen3 avatar Jun 14 '21 23:06 mohsen3

Hello contributors!

I am marking this issue as stale as it has not received any engagement from the community or maintainers a year. That does not imply that the issue has no merit! If you feel strongly about this issue

  • open a PR referencing and resolving the issue;
  • leave a comment on it and discuss ideas how you could contribute towards resolving it;
  • leave a comment and describe in detail why this issue is critical for your use case;
  • open a new issue with updated details and a plan on resolving the issue.

Throughout its lifetime, Ory has received over 10.000 issues and PRs. To sustain that growth, we need to prioritize and focus on issues that are important to the community. A good indication of importance, and thus priority, is activity on a topic.

Unfortunately, burnout has become a topic of concern amongst open-source projects.

It can lead to severe personal and health issues as well as opening catastrophic attack vectors.

The motivation for this automation is to help prioritize issues in the backlog and not ignore, reject, or belittle anyone.

If this issue was marked as stale erroneous you can exempt it by adding the backlog label, assigning someone, or setting a milestone for it.

Thank you for your understanding and to anyone who participated in the conversation! And as written above, please do participate in the conversation if this topic is important to you!

Thank you 🙏✌️

github-actions[bot] avatar Jun 15 '22 00:06 github-actions[bot]

@aeneasr hello, we are currently working on this here - https://github.com/ory/hydra/pull/3133

drwatsno avatar Jul 04 '22 14:07 drwatsno

so this has stalled? I really would need to clean up those sessions ... There are millions of rows there on a relatively small site.

Renkas avatar Mar 05 '23 19:03 Renkas

Hello contributors!

I am marking this issue as stale as it has not received any engagement from the community or maintainers for a year. That does not imply that the issue has no merit! If you feel strongly about this issue

  • open a PR referencing and resolving the issue;
  • leave a comment on it and discuss ideas on how you could contribute towards resolving it;
  • leave a comment and describe in detail why this issue is critical for your use case;
  • open a new issue with updated details and a plan for resolving the issue.

Throughout its lifetime, Ory has received over 10.000 issues and PRs. To sustain that growth, we need to prioritize and focus on issues that are important to the community. A good indication of importance, and thus priority, is activity on a topic.

Unfortunately, burnout has become a topic of concern amongst open-source projects.

It can lead to severe personal and health issues as well as opening catastrophic attack vectors.

The motivation for this automation is to help prioritize issues in the backlog and not ignore, reject, or belittle anyone.

If this issue was marked as stale erroneously you can exempt it by adding the backlog label, assigning someone, or setting a milestone for it.

Thank you for your understanding and to anyone who participated in the conversation! And as written above, please do participate in the conversation if this topic is important to you!

Thank you 🙏✌️

github-actions[bot] avatar Mar 05 '24 00:03 github-actions[bot]