crystal-pg icon indicating copy to clipboard operation
crystal-pg copied to clipboard

Handle `Array(Bytes)` encoding for query params containing non-Unicode text

Open jgaskins opened this issue 1 year ago • 0 comments

Ran into #267 tonight trying to bulk-insert MessagePack-encoded data (via unnest) and can confirm that it's Array(Bytes) that it's choking on. Since all arrays are text-encoded, we need to ensure text encoding for Bytes can handle binary data.

I tried special-casing Array(Bytes) all the way down, but not only does that require a lot more understanding of the binary encoding format than I currently have, it would've been an incomplete solution that only handles linear bytea arrays. Anyone using nested arrays would run into the same problem. So I decided against that.

I don't like that this solution increases the payload size over the wire by encoding every byte as 2 hexadecimal characters. Arrays of strings are implemented in terms of this method, so this impacts text[] encoding, but it doesn't impact encoding of strings outside of arrays, so performance impact should be minimal for most use cases. And every other PQ implementation I've found so far, other than libpq itself, also uses the hex encoding so it seems that if we're slow we're at least in good company. I think my ideal solution would be to convert all param types to binary encoding, but this was a far easier step and I couldn't find the specification for the binary encoding and it's late and I'm sleeeeeeepy.

Please check my work on this. I did confirm that the test case deserializes into the expected string:

db.query_each "SELECT '\\x7468697320697320612022736c69636522'::bytea" do |rs|
  puts String.new rs.read(Bytes)
end
# => this is a "slice"

… but I'm not confident there isn't something I've overlooked.

Fixes #267

jgaskins avatar Oct 13 '24 07:10 jgaskins