tomcat icon indicating copy to clipboard operation
tomcat copied to clipboard

Use SELECT...FOR UPDATE to update session data in the database instead of DELETE, INSERT.

Open ChristopherSchultz opened this issue 2 years ago • 14 comments

I have only compile-tested this; I wanted to get feedback on the approach, how to handle errors, etc.

ChristopherSchultz avatar Mar 08 '23 15:03 ChristopherSchultz

@ChristopherSchultz Is there a list of supported database systems with which the DataSourceStore is compatible? Are you sure that they all support "SELECT FOR UPDATE"? I tried to look that up and the "best" thing I found was https://www.sql-workbench.eu/dbms_comparison.html

isapir avatar Mar 08 '23 20:03 isapir

@ChristopherSchultz Is there a list of supported database systems with which the DataSourceStore is compatible? Are you sure that they all support "SELECT FOR UPDATE"? I tried to look that up and the "best" thing I found was https://www.sql-workbench.eu/dbms_comparison.html

Good question. I was originally going to use MERGE but the database I have at-hand (MariaDB) doesn't support it, so I went with SELECT...FOR UPDATE. It's not clear to me if the SELECT...FOR UPDATE NOWAIT listed in that table is specifically talking about the NOWAIT part of the query. For example, I know that both MariaDB and MySQL have supported SELECT...FOR UPDATE for a long time, but the entry for MySQL states "Since 8.0" and MariaDB says "No". A quick check of the HSQLDB Documentation shows that SELECT...FOR UPDATE is available while that list claims it is not supported. Same thing with IBM DB2 and SQL Server. But it's clear that not all RDMBSs support that syntax.

Maybe it's not possible to write a single code-path to satisfy all major RDBMS systems. I'd be happy to change the PR to offer a selection between the two via configuration.

ChristopherSchultz avatar Mar 09 '23 13:03 ChristopherSchultz

Yeah, I guess that site is not up to date. I also used SELECT FOR UPDATE in MySQL 5.7.

There is also INSERT ON CONFLICT UPDATE support in MySQL and Postgres, but it would be difficult to find an optimized solution that fits all.

isapir avatar Mar 09 '23 17:03 isapir

I have a question that why we don't add a real primary key(auto-increment) to solve the problem that primary key constraint violation when insert data to database simultaneously? And we can select session data from table by session-id and order (DESC) by ID (real primary key) when load session, then the newest result is what we need. For save can still keep the original code logic, delete (by session-id ) first and insert later. Thus, we can avoid adding lock(FOR UPDATE or others) from the database level by this way. I think this will work and so simple, but I don't know if there will be any security issues.

Thoughts?

aooohan avatar Mar 10 '23 03:03 aooohan

The problem is that there is a window of opportunity between the existing DELETE and INSERT where the session_id column (which is UNIQUE or equivalent) can be INSERTed by another thread. Changing the definition of the table to include a separate PK and removing the UNIQUE constraint from the session_id column is one way to solve this issue, but it would break any existing installation which is not using that kind of table definition.

I would definitely not want to do that for a point-release, and my goal here is to back-port this all the way back to 8.5 at this point. I think some larger changes could be made separately from this current effort, and restricted to maybe Tomcat 11.0.x and later.

ChristopherSchultz avatar Mar 10 '23 14:03 ChristopherSchultz

I've been thinking about this more and I think it could cause problems for people not using appId+sessionId (or just sessionId) as the primary key for the DB table storing the sessions. INSERT and DELETE are not required to include a value for the PK. Some databases can automatically-allocate a value for that purpose and so the existing DELETE+INSERT scheme works for them. But the SELECT ... FOR UPDATE will not because it does not include the table's PK.

One way to fix this would be to add an optional PK column (columns?) to the configuration.

I think this is pretty important, because databases which use PK-indexed organization (SQL Server, InnoDB, and others) will thrash-around if appId+sessionId are the PK columns, so anybody using those kinds of DBs would likely choose a separate PK column that doesn't cause those ugly performance problems.

Any other thoughts?

ChristopherSchultz avatar Mar 11 '23 16:03 ChristopherSchultz

Considering the difficult of handling all situation for all databases perhaps the following:

Punt the session update operation to a callback or interface, have one or two sensible defaults (like for the major databases and specific table configuration) and have a way for the user to provide their own implementation? Adds a little of complexity but code flow would arguably be cleaner and more organized since the DataStoreSource::save wouldn't need to have a bunch of knowledge on how it works. And it would be more flexible for users with more complex setups that they can't change.

Basically something similar to how the X509UserNameProvider works.

MikeNeilson avatar Mar 11 '23 19:03 MikeNeilson

I would love to see an interface that goes a step further and allows for NoSQL implementations as well. For example, Redis is an excellent option for a data store IMO.

isapir avatar Mar 11 '23 20:03 isapir

One way to fix this would be to add an optional PK column (columns?) to the configuration

+1 for a Primary Key column. It improves performance and is supported by all SQL compliant DBMS's.

isapir avatar Mar 12 '23 23:03 isapir

One way to fix this would be to add an optional PK column (columns?) to the configuration.

+1

aooohan avatar Mar 13 '23 02:03 aooohan

I have a question that why we don't add a real primary key(auto-increment) to solve the problem that primary key constraint violation when insert data to database simultaneously?

The problem is that there is a window of opportunity between the existing DELETE and INSERT where the session_id column (which is UNIQUE or equivalent) can be INSERTed by another thread. Changing the definition of the table to include a separate PK and removing the UNIQUE constraint from the session_id column is one way to solve this issue, but it would break any existing installation which is not using that kind of table definition.

I would definitely not want to do that for a point-release, and my goal here is to back-port this all the way back to 8.5 at this point. I think some larger changes could be made separately from this current effort, and restricted to maybe Tomcat 11.0.x and later.

ChristopherSchultz avatar Mar 13 '23 14:03 ChristopherSchultz

I would love to see an interface that goes a step further and allows for NoSQL implementations as well. For example, Redis is an excellent option for a data store IMO.

This is the whole point of the Store interface. If you want to implement your own strategy, you can implement it however you want. I believe there are some Redis- and Memcached-based Store implementations already available from third-parties.

ChristopherSchultz avatar Mar 13 '23 14:03 ChristopherSchultz

Punt the session update operation to a callback or interface, have one or two sensible defaults (like for the major databases and specific table configuration) and have a way for the user to provide their own implementation?

If you are going to provide your own implementation, you could just subclass DataSourceStore and change the behavior of the store (or load, etc.) methods.

This does bring up a good point: rather than having a new configuration option, I could simply have a subclass of DataSourceStore which overrides the store method to do the SELECT...FOR UPDATE. Then there is a much smaller change to the existing code, which is beneficial for stability. In fact, I think no existing code needs to change; everything is in the new class, including this optional new primary key column option.

ChristopherSchultz avatar Mar 13 '23 14:03 ChristopherSchultz

This does bring up a good point: rather than having a new configuration option, I could simply have a subclass of DataSourceStore which overrides the store method to do the SELECT...FOR UPDATE. Then there is a much smaller change to the existing code, which is beneficial for stability. In fact, I think no existing code needs to change; everything is in the new class, including this optional new primary key column option.

That's a great idea. It can also serve as a model for others who might want to extend the DataSourceStore.

isapir avatar Mar 13 '23 15:03 isapir