pgx icon indicating copy to clipboard operation
pgx copied to clipboard

Explicit type casting behaviors in pgx vs database/sql

Open mfridman opened this issue 8 months ago • 6 comments

Describe the bug

I found a slight difference in scanning behavior between pgx and database/sql. It appears pgx is more strict with type conversions.

For example, with this query:

SELECT
    setting
FROM
    pg_settings
WHERE
    name = 'max_connections';

The standard library successfully scans into:

type result struct {
    MaxConnections  int
}
var r result
err := row.Scan(
    &r.MaxConnections,
)

But pgx's row.Scan fails with:

can't scan into dest[0] (col: max_connections): cannot scan text (OID 25) in text format into *int

The fix is to cast in the query with setting::int, but should pgx behavior match the standard library more closely?

Specifically referring to the conversion in convertAssignRows

ps. I don't think this is a bug (label automatically added). Mainly trying to understand the design decision and totally get why being more strict is favorable.

mfridman avatar Apr 16 '25 23:04 mfridman

It's semi-intentional. The database/sql interface uses the type of the object being scanned into to determine how to parse the PostgreSQL data. All it knows is it got a string. It could be a text formatted int4, int8, or text/varchar. pgx uses its knowledge of the underlying PostgreSQL type as well as the destination Go type. This allows things like scanning into a map[string]string to work with both json and hstore underlying PostgreSQL types. But on the other hand it means that sometimes things that otherwise could work like text or unknown types don't.

Strictly requiring the correct types isn't a bad thing, but I'm not opposed to making some common situations like this more convenient. There are two ways I can think of that this could be done.

  • TextCodec could add support for scanning into additional interfaces like Int64Scanner. This would be pretty simple and would resolve the above case.
  • The core scanning path has a special case for scanning text formatted PostgreSQL values into strings regardless of actual PostgreSQL type. This allows scanning PostgreSQL types that are unknown to pgx. Potentially, this path could be expanded. This has the advantage of supporting more use cases, but the disadvantage of being a larger and trickier change.

jackc avatar Apr 26 '25 14:04 jackc

Thanks for the detailed explanation!

I believe you're referring to the pre-defined mappings here, right?

https://github.com/jackc/pgx/blob/ff9c26d8516879a5d48ae9d729c994d58e64777d/pgtype/pgtype_default.go

Assuming you'd be up for addressing this, do you see this as an opt-in option, or would this "just work" and the implementation gets relaxed?

I could see this going both ways tbh. Although I have a preference for option 1.

  1. Relaxed by default: a. better migration path from database/sql (my use case, selfishly) b. reduces friction for those common cases (might avoid the infamous DatabaseRegisterDataTypes) c. matches expectations, especially when Postgres returns text-formatted numbers (or string-based enums?)
  2. Opt-in: a. preserves pgx's current strict type safety guarantees b. no risk of breaking code, not sure what this could be, but FUD c. users explicitly choose when they want the convenience over strictness. (Given this hasn't come up in database/sql often enough, most folks are probably fine with this as default behavior since it only performs these conversions when it's 100% certain there's no loss of information)

mfridman avatar Sep 17 '25 13:09 mfridman

After a little further thought, I found a third place that might be better: u?int(8|16|32|64)?Wrapper (and I suppose float(32|64)Wrapper).

If these implemented TextScanner, then they could try to parse any text formatted PostgreSQL value.

Regardless of which approach, I would have this be the default behavior. There are already an enormous number of possible cases in the pgtype system and making this configurable would add even more.

jackc avatar Oct 04 '25 01:10 jackc

Perfect, thanks for the feedback.

Are you okay with others taking a stab at a PR, or do you anticipate this being more involved and better suited for you (or someone closer to the project) to implement? I'd be happy to give it a go if needed, as this would resolve a few issues for me.

mfridman avatar Oct 06 '25 16:10 mfridman

Took a quick look last night at what a minimal changeset might look like. Pushed it up here:

https://github.com/jackc/pgx/compare/master...mfridman:pgx:mf/oid-text

Needs tests and more thought on edge cases, but it works for the basic scenarios (did a bit of manual testing).

  1. Expanding pgtype/text.go to support text to numeric/bool scan plans
  2. Added unknown-OID fallback helpers in pgtype/pgtype.go for both encoding and decoding

For example, a Postgres enum type:

[
  {
    "typname": "my_enum",
    "oid": "16650",
    "typarray": "16649"
  },
  {
    "typname": "_my_enum",
    "oid": "16649",
    "typarray": "0"
  }
]

And the corresponding Go type:

type MyEnum string

const (
    MyEnumFoo MyEnum = "foo"
)

Currently results in:

cannot scan unknown type (OID XXXX) in text format into *[]MyEnum

With this fallback in place, enums (and arrays of enums) scan cleanly into string aliases without prior registration. This seems especially useful for sqlc users who generate enum wrappers (like me).

mfridman avatar Oct 07 '25 02:10 mfridman

I just made a change (b593d5a4162fe368629bf71986ecf86bed88804d) for #2399 which may impact this slightly.

I'm fairly comfortable with automatically parsing PostgreSQL text formatted values into Go numeric types even when the exact types don't match. But https://github.com/jackc/pgx/compare/master...mfridman:pgx:mf/oid-text seems to be a more extensive change than the original issue of scanning PostgreSQL text into a Go int. And given that this change seems to trigger based on unknown OIDs instead of known incompatible types, I'm not sure it would work for the original use case.

The changes to the encoding system make me even more nervous as errors there could lead to data corruption. It's trying to make pgx correctly encode an unknown Go type into an unknown PostgreSQL type.


As far as enums and arrays of enums go: enums should work directly as is. Arrays of enums require registration. But this can be done with a small function that queries the PostgreSQL schema for all enums and enum arrays and calls LoadType(s) on the result.

jackc avatar Oct 11 '25 17:10 jackc