Explicit type casting behaviors in pgx vs database/sql
Describe the bug
I found a slight difference in scanning behavior between pgx and database/sql. It appears pgx is more strict with type conversions.
For example, with this query:
SELECT
setting
FROM
pg_settings
WHERE
name = 'max_connections';
The standard library successfully scans into:
type result struct {
MaxConnections int
}
var r result
err := row.Scan(
&r.MaxConnections,
)
But pgx's row.Scan fails with:
can't scan into dest[0] (col: max_connections): cannot scan text (OID 25) in text format into *int
The fix is to cast in the query with setting::int, but should pgx behavior match the standard library more closely?
Specifically referring to the conversion in convertAssignRows
ps. I don't think this is a bug (label automatically added). Mainly trying to understand the design decision and totally get why being more strict is favorable.
It's semi-intentional. The database/sql interface uses the type of the object being scanned into to determine how to parse the PostgreSQL data. All it knows is it got a string. It could be a text formatted int4, int8, or text/varchar. pgx uses its knowledge of the underlying PostgreSQL type as well as the destination Go type. This allows things like scanning into a map[string]string to work with both json and hstore underlying PostgreSQL types. But on the other hand it means that sometimes things that otherwise could work like text or unknown types don't.
Strictly requiring the correct types isn't a bad thing, but I'm not opposed to making some common situations like this more convenient. There are two ways I can think of that this could be done.
TextCodeccould add support for scanning into additional interfaces likeInt64Scanner. This would be pretty simple and would resolve the above case.- The core scanning path has a special case for scanning text formatted PostgreSQL values into strings regardless of actual PostgreSQL type. This allows scanning PostgreSQL types that are unknown to pgx. Potentially, this path could be expanded. This has the advantage of supporting more use cases, but the disadvantage of being a larger and trickier change.
Thanks for the detailed explanation!
I believe you're referring to the pre-defined mappings here, right?
https://github.com/jackc/pgx/blob/ff9c26d8516879a5d48ae9d729c994d58e64777d/pgtype/pgtype_default.go
Assuming you'd be up for addressing this, do you see this as an opt-in option, or would this "just work" and the implementation gets relaxed?
I could see this going both ways tbh. Although I have a preference for option 1.
- Relaxed by default:
a. better migration path from database/sql (my use case, selfishly)
b. reduces friction for those common cases (might avoid the infamous
DatabaseRegisterDataTypes) c. matches expectations, especially when Postgres returns text-formatted numbers (or string-based enums?) - Opt-in: a. preserves pgx's current strict type safety guarantees b. no risk of breaking code, not sure what this could be, but FUD c. users explicitly choose when they want the convenience over strictness. (Given this hasn't come up in database/sql often enough, most folks are probably fine with this as default behavior since it only performs these conversions when it's 100% certain there's no loss of information)
After a little further thought, I found a third place that might be better: u?int(8|16|32|64)?Wrapper (and I suppose float(32|64)Wrapper).
If these implemented TextScanner, then they could try to parse any text formatted PostgreSQL value.
Regardless of which approach, I would have this be the default behavior. There are already an enormous number of possible cases in the pgtype system and making this configurable would add even more.
Perfect, thanks for the feedback.
Are you okay with others taking a stab at a PR, or do you anticipate this being more involved and better suited for you (or someone closer to the project) to implement? I'd be happy to give it a go if needed, as this would resolve a few issues for me.
Took a quick look last night at what a minimal changeset might look like. Pushed it up here:
https://github.com/jackc/pgx/compare/master...mfridman:pgx:mf/oid-text
Needs tests and more thought on edge cases, but it works for the basic scenarios (did a bit of manual testing).
- Expanding
pgtype/text.goto support text to numeric/bool scan plans - Added unknown-OID fallback helpers in
pgtype/pgtype.gofor both encoding and decoding
For example, a Postgres enum type:
[
{
"typname": "my_enum",
"oid": "16650",
"typarray": "16649"
},
{
"typname": "_my_enum",
"oid": "16649",
"typarray": "0"
}
]
And the corresponding Go type:
type MyEnum string
const (
MyEnumFoo MyEnum = "foo"
)
Currently results in:
cannot scan unknown type (OID XXXX) in text format into *[]MyEnum
With this fallback in place, enums (and arrays of enums) scan cleanly into string aliases without prior registration. This seems especially useful for sqlc users who generate enum wrappers (like me).
I just made a change (b593d5a4162fe368629bf71986ecf86bed88804d) for #2399 which may impact this slightly.
I'm fairly comfortable with automatically parsing PostgreSQL text formatted values into Go numeric types even when the exact types don't match. But https://github.com/jackc/pgx/compare/master...mfridman:pgx:mf/oid-text seems to be a more extensive change than the original issue of scanning PostgreSQL text into a Go int. And given that this change seems to trigger based on unknown OIDs instead of known incompatible types, I'm not sure it would work for the original use case.
The changes to the encoding system make me even more nervous as errors there could lead to data corruption. It's trying to make pgx correctly encode an unknown Go type into an unknown PostgreSQL type.
As far as enums and arrays of enums go: enums should work directly as is. Arrays of enums require registration. But this can be done with a small function that queries the PostgreSQL schema for all enums and enum arrays and calls LoadType(s) on the result.