vertx-sql-client icon indicating copy to clipboard operation
vertx-sql-client copied to clipboard

Improve the efficiency of MySQL preparedBatch

Open BillyYccc opened this issue 5 years ago • 26 comments

Currently prepareBatch implementation in MySQL is just an emulation for executing preparedQuery several times, MySQL Client/Server protocol does not support pipelining the requests therefore users don't benefit much in respect of performance.

MySQL JDBC Connector provides an option rewriteBatchedStatements to combine multiple inserts into one insert command, this will make better utilization of network because it reduces the round trips here. I think we could support this useful feature in the MySQL client.

BillyYccc avatar Sep 19 '19 12:09 BillyYccc

does it imply to aggregate SQL statements and parsing them ?

vietj avatar Sep 19 '19 14:09 vietj

yes and this feature is often used with https://github.com/eclipse-vertx/vertx-sql-client/issues/422.

BillyYccc avatar Sep 19 '19 14:09 BillyYccc

so it does imply interpolation ?

vietj avatar Sep 20 '19 07:09 vietj

No, interpolation is NOT a must.

Generally speaking it's something like rewriting INSERT INTO test VALUES (?, ?) to INSERT INTO test VALUES (?, ?),(?, ?),(?, ?) for a INSERT batch with 3 groups of parameters.

BillyYccc avatar Sep 20 '19 08:09 BillyYccc

ok that might be fine then, what I don't want is ourselves to replace values into strings

vietj avatar Sep 20 '19 14:09 vietj

that's open door for security issues

vietj avatar Sep 20 '19 14:09 vietj

This issue is rather old, I guess there aren't any plans on picking this up again, right? @vietj

Just wanted to point out that this is also very relevant for PostgreSQL where a similar property exists in the JDBC driver, see for example this guide: https://vladmihalcea.com/postgresql-multi-row-insert-rewritebatchedinserts-property/

And the performance of regular batching is not great in comparison to multi-value inserts because the inserts are still executed individually on the database. We did some simple tests with a dockerized PostgreSQL on an admittedly underpowered dev machine, but the performance difference is very noticable. Inserting 10k rows in a table took around 5 minutes (!) with a batch size of 50, while inserting 10k rows with multi-value inserts took 5 seconds.

It would be great if a similar "rewrite" feature to transform batches into multi-value inserts could be introduced in the vertx-sql-client. For example Hibernate Reactive supports batching with a simple property, so if we had such a rewrite feature in the sql-client, you would get fast inserts "for free". In contrast to that, Hibernate does not automatically generate multi-value inserts, they need to be written manually using native queries.

markusdlugi avatar Sep 02 '21 12:09 markusdlugi

@markusdlugi it seems that this feature would require to parse SQL to rewrite, doesn't it ?

vietj avatar Sep 02 '21 13:09 vietj

so for PostgreSQL it means rewriting statement like insert into tab(a,b,c) values (?,?,?), (?,?,?), ..., (?,?,?) which means parsing and rewriting SQL. So unless we can re-use an existing correct SQL parser, I don't see a trivial solution to this unless we provide a facility for the user to provide a builder that we would use to generate correct SQL.

vietj avatar Sep 02 '21 13:09 vietj

I think this is the relevant method doing the rewrite in the PostgreSQL JDBC Driver:

https://github.com/pgjdbc/pgjdbc/blob/f61fbfe7b72ccf2ca0ac2e2c366230fdb93260e5/pgjdbc/src/main/java/org/postgresql/core/v3/BatchedQuery.java#L89

Called by this method:

https://github.com/pgjdbc/pgjdbc/blob/f61fbfe7b72ccf2ca0ac2e2c366230fdb93260e5/pgjdbc/src/main/java/org/postgresql/core/v3/QueryExecutorImpl.java#L252

I only had a brief look at the code. Obviously, you do need to change the SQL that is sent to the server, which is why they build a new SQL string using StringBuilder. However, it seems like they are not manually inserting any parameter values into the SQL string (which would make you vulnerable for SQL injections, you're right).

It's just that instead of sending a prepared statement with multiple batches where each batch is inserted and executed individually at the server, you rewrite it to one insert statement where each batch is added as (?1, ?2, ?3, ...). But you're still working with a prepared statement.

I admit the code is not pretty and it's not nice to manually fiddle with the SQL string, but the performance improvement speaks for itself :)

markusdlugi avatar Sep 02 '21 13:09 markusdlugi

sure it looks great, I'm wondering why batch inserts are not as efficient

vietj avatar Sep 02 '21 13:09 vietj

I will try to find a supportable solution to this

vietj avatar Sep 02 '21 13:09 vietj

I think we could provide something like:

MultiQuery<RowSet<Row>> create(Collector<CharSequence, ?, String> collector1, Collector<CharSequence, ?, String> collector2)

that would be used this way

MultiQuery<RowSet<Row>>  query = MultiQuery.create(Collectors.joining(",", "insert into tab(a,b,c) values ", ""), Collectors.joining(",", "(", ")"));

The first collector would assemble the query and the second collector would assemble the tuple.

vietj avatar Sep 02 '21 13:09 vietj

Sounds like a good idea to use Collectors.joining() instead of manually concatenating everything, at least would be way more maintainable.

However, I'm not quite sure if I understand the idea correctly. So would you offer this as a new API? Or would it be also possible to offer some property (such as the "reWriteBatchedInserts") which automatically does the above transformation for batched inserts if the property is enabled?

I'd prefer the latter because as mentioned, at least in our applications we are currently only using the vertx-sql-client indirectly via Hibernate Reactive, and a new API would have to be supported by Hibernate Reactive first.

markusdlugi avatar Sep 02 '21 14:09 markusdlugi

that would be a new API. Supporting something automatic is not something the project wants to do unless there is a reliable and supportable way to parse SQL.

Hibernate Reactive could be modified to use this API, i.e it would generate the collectors instead of generating the SQL. Note that hibernate reactive could also work currently with the existing API and generate the same SQL that you propose to rewrite, i.e it would concatenate the values clauses along with the concatenated Tuple.

That is something you can ask the Hibernate team could implement or that you could contribute.

vietj avatar Sep 02 '21 22:09 vietj

Okay I see, thanks for your help @vietj

@Sanne @gavinking What do you guys think? Is this something Hibernate Reactive could support?

markusdlugi avatar Sep 03 '21 04:09 markusdlugi

yes sure Hibernate could support this theoretically, but rather than aggregation of individually generated SQL statements I would prefer to generate the optimal SQL right away.

Which implies it would be best to wait for Hibernate 6, if that's possible?

Sanne avatar Sep 03 '21 06:09 Sanne

thanks @Sanne , my opinion is that a client should in practice remains agnostic of SQL that is tied to the application

On Fri, Sep 3, 2021 at 8:52 AM Sanne Grinovero @.***> wrote:

yes sure Hibernate could support this theoretically, but rather than aggregation of individually generated SQL statements I would prefer to generate the optimal SQL right away.

Which implies it would be best to wait for Hibernate 6, if that's possible?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eclipse-vertx/vertx-sql-client/issues/423#issuecomment-912302224, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXDCU4ZNZRPTFIPBWTWU3UABWEHANCNFSM4IYKUY5Q .

vietj avatar Sep 03 '21 10:09 vietj

another way to do it, would be to have somebody write a wrapper of the Vert.x SQL Client API that would perform such query optimization

vietj avatar Sep 03 '21 10:09 vietj

thanks @Sanne , my opinion is that a client should in practice remains agnostic of SQL that is tied to the application

+1

The posgresql (JDBC) driver also does parse the SQL in some situations, and while it works fairly well it's sometimes problematic - especially if one aims at writing efficient and lean code.

Sanne avatar Sep 03 '21 10:09 Sanne

yes sure Hibernate could support this theoretically, but rather than aggregation of individually generated SQL statements I would prefer to generate the optimal SQL right away. Which implies it would be best to wait for Hibernate 6, if that's possible?

I agree, it would be ideal if Hibernate generated the optimal SQL right away. I'm not familiar with your roadmap, when do you plan to release Hibernate 6 or rather switch to that in HR? If it's not too far out, we might also wait for that. Although I guess we would still have to change something in Hibernate to allow generation of multi-value inserts, or is that actually a feature of Hibernate 6?

thanks @Sanne , my opinion is that a client should in practice remains agnostic of SQL that is tied to the application

+1 The posgresql (JDBC) driver also does parse the SQL in some situations, and while it works fairly well it's sometimes problematic - especially if one aims at writing efficient and lean code.

I mean, I mainly came up with the idea because of the feature that currently exists in JDBC but not in the vertx-sql-client. I think if it's behind a feature flag that users have to opt-in to use then it might be okay, but I also understand if you're reluctant to do such a thing in the client since you normally don't touch the SQL directly.

In the end, I guess this would only be a band-aid to remedy that Hibernate generates code which is inefficient, so I agree that it would best be fixed in Hibernate.

markusdlugi avatar Sep 03 '21 10:09 markusdlugi

Right.

However to be clear @markusdlugi : waiting for Hibernate Reactive to upgrade to Hibernate ORM core 6 and have this efficiencly implemented in 6 will take quite some more time - so if you have a real and urgent need for this, while I can't justify spending my own time on it I'd be totally fine to merge a band-aid solution protected by an experimental feature flag if you want to contribute that. It could certainly be useful for others as well, and to us as well to compare with when we'll polish this in 6.

Sanne avatar Sep 03 '21 11:09 Sanne

I think for instance a smart proxy like SqlProxy could do such thing.

On Fri, Sep 3, 2021 at 12:19 PM Markus Dlugi @.***> wrote:

yes sure Hibernate could support this theoretically, but rather than aggregation of individually generated SQL statements I would prefer to generate the optimal SQL right away. Which implies it would be best to wait for Hibernate 6, if that's possible?

I agree, it would be ideal if Hibernate generated the optimal SQL right away. I'm not familiar with your roadmap, when do you plan to release Hibernate 6 or rather switch to that in HR? If it's not too far out, we might also wait for that. Although I guess we would still have to change something in Hibernate to allow generation of multi-value inserts, or is that actually a feature of Hibernate 6?

thanks @Sanne https://github.com/Sanne , my opinion is that a client should in practice remains agnostic of SQL that is tied to the application

+1 The posgresql (JDBC) driver also does parse the SQL in some situations, and while it works fairly well it's sometimes problematic - especially if one aims at writing efficient and lean code.

I mean, I mainly came up with the idea because of the feature that currently exists in JDBC but not in the vertx-sql-client. I think if it's behind a feature flag that users have to opt-in to use then it might be okay, but I also understand if you're reluctant to do such a thing in the client since you normally don't touch the SQL directly.

In the end, I guess this would only be a band-aid to remedy that Hibernate generates code which is inefficient, so I agree that it would best be fixed in Hibernate.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eclipse-vertx/vertx-sql-client/issues/423#issuecomment-912429430, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXDCQXRVUCL5FOBABQMODUACOKFANCNFSM4IYKUY5Q .

vietj avatar Sep 03 '21 12:09 vietj

@Sanne @vietj @markusdlugi Reactive Hibernate supports ORM 6. Is support for something similar to "reWriteBatchedInserts" planned? Thx for help

adampoplawski avatar Mar 19 '24 11:03 adampoplawski

No it's not

tsegismont avatar Mar 19 '24 14:03 tsegismont

@tsegismont thx, for quick answer.

adampoplawski avatar Mar 19 '24 14:03 adampoplawski