Binary data efficiency
I've an use case where I've compared the performance of inserting bytea data into a table with (a) hex-encoded string with ::BYTEA case (b) base64-encoded string with decode(str, 'base64'). It turned out that the later provided 40% better overall throughput in my setup.
I'm wondering: what do we need to implement in the driver to support binary bytea wire transfer? /cc @joeconwaystk
I thought it was implemented? Here are supported types AFAIK: https://github.com/stablekernel/postgresql-dart/blob/eebfc7387d3e2af088efb01fb7248cb9e6ca1fe8/lib/src/binary_codec.dart#L283
Hm, maybe I'm missing something, but how can I use it? If I add an Uint8List in substitionValues, I've got the following stacktrace, which is using the text encoder:
#0 PostgresTextEncoder.convert (package:postgres/src/text_codec.dart:40:5)
#1 PostgreSQLFormat.substitute.<anonymous closure> (package:postgres/src/substituter.dart:59:44)
#2 PostgreSQLFormat.substitute.<anonymous closure> (package:postgres/src/substituter.dart:125:26)
#3 MappedListIterable.elementAt (dart:_internal/iterable.dart:414:29)
#4 ListIterable.join (dart:_internal/iterable.dart:165:22)
#5 PostgreSQLFormat.substitute (package:postgres/src/substituter.dart:134:8)
#6 Query.sendSimple (package:postgres/src/query.dart:45:38)
#7 _PostgreSQLConnectionStateIdle.processQuery
Did you suffix your substitution variable? e.g. INSERT INTO t (v) VALUES (@v:bytea) That's the only way the driver knows to use the binary encoder IIRC.
Uint8List content; // .. initialized
await conn.execute(
"INSERT INTO x.slice (id, slice, content) VALUES (@id, @slice, @content:bytea)",
substitutionValues: {'id': id, 'slice': 0, 'content': content});
(same exception)
Looks like execute will always use the text protocol and query will use the binary protocol. I don't recall why that is or if it should remain that way.
I'm happy to help with this but not sure where to look / what to change. Any suggestion?
Sure, in connection.dart:307 and connection.dart:337. There is a field named onlyReturnAffectedRowCount that is true when using execute. In query.dart, sendSimple (text protocol) is executed if this flag is true, otherwise sendExtended (binary protocol) is executed.
The thing I can't recall (and won't have a chance to dig into for a bit) is why onlyReturnAffectedRowCount is tied to whether or not the binary/text protocol is used. This flag looks to have an impact on what is returned, too - this may just be bad programming on my part by wrapping multiple facets into a single boolean.
I've switched the sendSimple to sendExtended and it speed up this specific benchmark by 70-80%. Thanks for the pointer, I'd probably test it on my real DB too, and see how that goes.
Some add'l info - there does need to be a way to send simple queries because not all data types are supported. The benefit to the binary protocol is that binary data is smaller, but also, query queries are stored as prepared statements by default. The identifier for the prepared statement is generated the first time a query is executed. The format string of the query (so, the query minus any substitution variable values) is the key, and the statement identifier is the value. Most of the speed gains is be due to using prepared statements.
Would it be reasonable to detect the substitution types and if all of them are supported, the query would be using the binary protocol + prepared statements?
I have benchmarked a write-heavy bytea workload with INSERT ... inside .execute vs. inside .query. Debugging showed that the later used the binary protocol, ~~yet, I haven't seen much improvement over the text protocol + base64 encoding.~~ and it was much faster. (My fist benchmark was wrong.)
I guess it means that I can just use the .query for the performance-critical inserts/updates, and that part of the driver does not need to change much...