dbdpg
dbdpg copied to clipboard
Fail gracefully on bad UTF-8.
(an updated version, missed one change)
For some reason it's possible to get an invalid UTF8 input in downgrade, which leads to a very weird stop of processing and errors. The problem was observed on Request Tracker 4.2.8 with emails with utf8-encoded subject lines.
This seems like a really hacky fix, and the issue might be somewhere else on the stack, like some kind of a failure to mark a specific string as UTF-8, but I have no idea how to track that down.
This looks like you're passing character strings to DBD::Pg, but not using client_encoding=utf8.
Since version 3.2, the behaviour has been to deal in character strings for client_encoding=utf8 and byte strings (i.e. you need to do your own encoding and decoding) for other encodings.
Falling back to using the UTF-8 representation for non-downgradeable strings means you get inconsistently encoded data in your database: code points in the 0-255 range are represented as-is, while those above are represented by characters corresponding to the bytes in their UTF-8 encoding.
The error message and documentation of this clearly needs improving, though.
Previous code was correct, this change is wrong. You cannot pass values above 0xFF to binary fields which are used by pg_downgraded_sv as binary data must be in range 0x00 ... 0xFF. And when second parameter of sv_utf8_downgrade is false then that function throws an error on invalid data input -- so correct behavior.
@ilmari Does this need any work? A doc patch or anything?
Anyone want to improve the docs on this?
@turnstep: I agree that problem is not in code, but in "documentation", which can be improved. But I really do not have time to write documentation updates. But if somebody else is going to prepare documentation patch, I can find time for reviewing it!
@turnstep: But in case nobody is going to prepare and send documentation patch, please at least open an issue ticket that documentation needs to be improved. So in case somebody would be confused again we can at least redirect to documenation issue ticket. Meaning information about "documentation improvement is needed" is not lost.