postgresql-simple
postgresql-simple copied to clipboard
Should we deprecate the ToField instance for ByteString?
Currently, in ToField, the ByteString instance treats the argument as text format. For example:
> let bs = "\\123" :: ByteString
> bs
"\\123"
> query conn "SELECT ? :: TEXT" (Only bs) :: IO [Only ByteString]
[Only {fromOnly = "\\123"}]
> query conn "SELECT ? :: BYTEA" (Only bs) :: IO [Only ByteString]
[Only {fromOnly = "S"}] -- whoops
query conn "SELECT ? :: BYTEA" (Only (Binary bs)) :: IO [Only ByteString]
> [Only {fromOnly = "\\123"}]
If users represent BYTEA using ByteString, their code will silently be wrong, since ToField won't escape the data properly for BYTEA's input function (but it will for Binary
, of course). I don't think FromField has this problem, as it checks the typeOid
of the incoming data.
I propose that we remove the ToField instance for ByteString. Perhaps we could provide a newtype wrapper to access the same functionality, similar to the Binary wrapper:
newtype TextFormat a = TextFormat a
Actually, there might be a reasonable alternative: pass the parameter in binary format. I think this would handle both TEXT and BYTEA correctly. A downside is that, if we wanted to support rendering SQL queries for external use, or rendering COPY FROM data, we wouldn't be able to use ToField
as-is, as it would lack the distinction between TEXT and BYTEA needed for rendering parameters in text format.
Thoughts?
Personally, if postgresql-simple was unreleased (or very new) I probably would have made this decision in hindsight. As is, I'm a bit indifferent, but am afraid that this would cause more pain than it's worth. I'd like to hear from some other users how this would impact their apps (or not...)
A fair bit of my code does depend on the current behavior and I'm not really prepared to deal with this breakage at the moment, so I certainly would need a long-ish deprecation period.
Another idea, do you think we should modify the FromField instance to handle both Binary and Text?
Hmm, upon further review, I think I misunderstood this request. Maybe. Out of curiousity, can you mark an instance as deprecated and get a warning if you use it with GHC?
i just want to note, that aeson removed the bytestring instances, so users might well be aware that bytestring for textual data is broken. so now might be a good time for that change (with version upgrade according to pvp of course).
This instance should be removed, for the same reason as Aeson did it: The behavior is very surprising. In Aeson's case, since JSON is actually text-based and there is no single canonical way to convert arbitrary binary data to text, removal was the only option; here, it could be made to do the same thing as Binary - but why? The user can easily specify, when that's desired.
Yes, I was just bitten by this, although it was a system still in development, so no big deal.
Well, it is documented that the binary data is assumed to be UTF-8. More accurately it should match the character encoding for the database, but for better or worse postgresql-simple pretty much assumes UTF-8, as there are other places I'm aware of that make this assumption and probably some that I'm not.
This instance was imported from mysql-simple without much thought on my part, at this point I wouldn't have done this had I known better at the time, but I'm also not enthusiastic about making the change either.
I mean, it's not text! It's binary data! It's one thing to assume that text is in some particular encoding; it's quite another to mishandle raw binary values, which are not necessarily text at all, through an encode-and-decode pass, thereby disallowing any values that aren't legitimate in that encoding.
The fact that this happens is due to the implementation detail that they are spliced in as raw SQL instead of being passed as bound parameters, which was surprising in itself, since I thought that was possible although I haven't investigated the details of the protocol.
But I guess it doesn't really matter, now that I know about the behavior and the Binary newtype.
I agree. But since it doesn't really matter that much, I'm not inclined to "fix" it and break everybody's code. So this change would have to come:
- At the same time as a major interface overhaul that's breaking a lot of other things anyway, and for better reasons. Which I think might be plausible, but I dunno.
- Something to fix in whatever comes after postgresql-simple, whether that is written by myself or somebody else.
As for protocol-level parameters, yes, postgresql does support them, and yes, postgresql-simple should optionally support them. The problem is that, at least with libpq, you can't use both protocol-level parameters and issue multiple statements in one round trip. (Ick, I know.) However, I have looked at the underlying protocol a bit and, IIRC, there might be a way to support protocol-level parameters and multiple statements, as there are some tricks you can pull at the protocol level that libpq doesn't support.
So I do think there are some good reasons to produce a native haskell alternative to libpq, that would expose some of these lower-level details, but the big problem is supporting all the connection and authentication options that libpq does.
My thought on that count is to figure out if we could expose libpq's lower-level IO routines, so we could use a modified libpq to connect and authenticate, and then implement the protocol itself in Haskell. It's still a pretty significant undertaking for gains that aren't entirely clear to me at this point.
Anyway, I digress. As far as supporting protocol-level parameters using libpq, I think there needs to be a way to either interpolate or use protocol-level parameters, or some combination thereof. Getting the interface right so that it's predictable and unified may be a bit tricky, however.
As for concrete advantages of a native implementation, templatepg implements a small subset of the frontend-backend protocol. I don't think it's particularly possible to implement templatepg on top of libpq, but I could be wrong.
A fair position. I'm glad we talked it through, and thank you for the detailed response.
My pleasure =).
Although this issue is long time inactive I will share my experience with this:
I ran into a problem with this instance recently. Essentially I expected to store ByteString
binary data and I was very surprised to notice that my database BYTEA values were from time to time empty or partial due to a \NUL
character in my ByteString
i was inserting.
I would expect it to store binary data only not text. As mentioned above it should be removed altogether (as in case of Aeson
) or store binary data only and skip any encoding at all (UTF8 or default database encoding). As a consequence It just might make the Binary
wrapper redundant ?
I understand the issue with backwards compatibility and breaking everyones code but the current implementation just feels wrong.
This is also an issue with Text
- NUL
in a Text is perfectly valid.