postgresql-dart icon indicating copy to clipboard operation
postgresql-dart copied to clipboard

Binary data efficiency

Open isoos opened this issue 7 years ago • 11 comments

I've an use case where I've compared the performance of inserting bytea data into a table with (a) hex-encoded string with ::BYTEA case (b) base64-encoded string with decode(str, 'base64'). It turned out that the later provided 40% better overall throughput in my setup.

I'm wondering: what do we need to implement in the driver to support binary bytea wire transfer? /cc @joeconwaystk

isoos avatar Oct 30 '18 18:10 isoos

I thought it was implemented? Here are supported types AFAIK: https://github.com/stablekernel/postgresql-dart/blob/eebfc7387d3e2af088efb01fb7248cb9e6ca1fe8/lib/src/binary_codec.dart#L283

itsjoeconway avatar Oct 30 '18 19:10 itsjoeconway

Hm, maybe I'm missing something, but how can I use it? If I add an Uint8List in substitionValues, I've got the following stacktrace, which is using the text encoder:

#0      PostgresTextEncoder.convert (package:postgres/src/text_codec.dart:40:5)
#1      PostgreSQLFormat.substitute.<anonymous closure> (package:postgres/src/substituter.dart:59:44)
#2      PostgreSQLFormat.substitute.<anonymous closure> (package:postgres/src/substituter.dart:125:26)
#3      MappedListIterable.elementAt (dart:_internal/iterable.dart:414:29)
#4      ListIterable.join (dart:_internal/iterable.dart:165:22)
#5      PostgreSQLFormat.substitute (package:postgres/src/substituter.dart:134:8)
#6      Query.sendSimple (package:postgres/src/query.dart:45:38)
#7      _PostgreSQLConnectionStateIdle.processQuery

isoos avatar Oct 30 '18 19:10 isoos

Did you suffix your substitution variable? e.g. INSERT INTO t (v) VALUES (@v:bytea) That's the only way the driver knows to use the binary encoder IIRC.

itsjoeconway avatar Oct 30 '18 19:10 itsjoeconway

    Uint8List content; // .. initialized
    await conn.execute(
        "INSERT INTO x.slice (id, slice, content) VALUES (@id, @slice, @content:bytea)",
        substitutionValues: {'id': id, 'slice': 0, 'content': content});

(same exception)

isoos avatar Oct 30 '18 20:10 isoos

Looks like execute will always use the text protocol and query will use the binary protocol. I don't recall why that is or if it should remain that way.

itsjoeconway avatar Oct 30 '18 20:10 itsjoeconway

I'm happy to help with this but not sure where to look / what to change. Any suggestion?

isoos avatar Oct 30 '18 20:10 isoos

Sure, in connection.dart:307 and connection.dart:337. There is a field named onlyReturnAffectedRowCount that is true when using execute. In query.dart, sendSimple (text protocol) is executed if this flag is true, otherwise sendExtended (binary protocol) is executed.

The thing I can't recall (and won't have a chance to dig into for a bit) is why onlyReturnAffectedRowCount is tied to whether or not the binary/text protocol is used. This flag looks to have an impact on what is returned, too - this may just be bad programming on my part by wrapping multiple facets into a single boolean.

itsjoeconway avatar Oct 30 '18 20:10 itsjoeconway

I've switched the sendSimple to sendExtended and it speed up this specific benchmark by 70-80%. Thanks for the pointer, I'd probably test it on my real DB too, and see how that goes.

isoos avatar Oct 30 '18 21:10 isoos

Some add'l info - there does need to be a way to send simple queries because not all data types are supported. The benefit to the binary protocol is that binary data is smaller, but also, query queries are stored as prepared statements by default. The identifier for the prepared statement is generated the first time a query is executed. The format string of the query (so, the query minus any substitution variable values) is the key, and the statement identifier is the value. Most of the speed gains is be due to using prepared statements.

itsjoeconway avatar Oct 30 '18 22:10 itsjoeconway

Would it be reasonable to detect the substitution types and if all of them are supported, the query would be using the binary protocol + prepared statements?

isoos avatar Oct 30 '18 23:10 isoos

I have benchmarked a write-heavy bytea workload with INSERT ... inside .execute vs. inside .query. Debugging showed that the later used the binary protocol, ~~yet, I haven't seen much improvement over the text protocol + base64 encoding.~~ and it was much faster. (My fist benchmark was wrong.)

I guess it means that I can just use the .query for the performance-critical inserts/updates, and that part of the driver does not need to change much...

isoos avatar Dec 22 '18 00:12 isoos