beam
beam copied to clipboard
Postgres returns "numeric" type for "sum_" over Int64 column causing "ColumnTypeMismatch"-error
I have a table whose Beam table definition contains an Int64 field, which is a BIGINT in the Postgres database. The field in question is called qty:
orderbooks=> SELECT pg_typeof(qty) FROM path_qtys LIMIT 1;
pg_typeof
-----------
bigint
(1 row)
However, when I do a sum_ over this field, Postgres returns a numeric for this column:
orderbooks=> SELECT SUM(qty), pg_typeof(SUM(qty)) FROM path_qtys GROUP BY path__id LIMIT 1;
sum | pg_typeof
-----+-----------
0 | numeric
(1 row)
In the Beam/Haskell code for the sum_ expression, the return type is Int64, which results in this query failing with the error:
BeamRowReadError {
brreColumn = Just 6,
brreError = ColumnTypeMismatch {
ctmHaskellType = "Integer",
ctmSQLType = "numeric",
ctmMessage = "types incompatible"
}
}
A temporary workaround is to cast_ the result of the sum_ operation to bigint.
The type of sum_ lets you choose the resulting integral type:
sum_ :: ( BeamSqlBackend be, Num a )
=> QExpr be s a -> QAgg be s (Maybe a)
Ideally we'd make it dependent on the input but this would be tricky since it's backend specific. Since sum over bigints gives you a numeric, you'll need to annotate your Haskell code to make it use Scientific instead of Int64. as_ is provided for precisely this purpose.
Hi @kmicklas
Unfortunately I can't get as_ to work as you suggest. _sum indeed supports operations on all Num types, but it doesn't let me choose the return type, since this is equal to the input column type (given by the a in the QExpr be s a-argument). In my case the input column type is Int64 (pathqtyQty):
data PathQtyT f
= PathQty
{ pathqtyCalc :: PrimaryKey Calc.CalculationT f
, pathqtyPath :: PrimaryKey Path.PathT f
, pathqtyQty :: C f Int64
, pathqtyPriceLow :: C f Double
, pathqtyPriceHigh :: C f Double
} deriving Generic
So the expression \pathQty' -> as_ @Scientific $ fromMaybe_ (val_ 0) (sum_ $ PathQty.pathqtyQty pathQty') is ill-typed since as_ @Scientific tries to cast Scientific -> Scientific (to avoid ambiguity) while the column type of fromMaybe_ (val_ 0) (sum_ $ PathQty.pathqtyQty pathQty') is Int64.
This is in alignment with my understanding that the only use for as_ is for resolving ambiguity. A cast is required, as far as I can see.
Doh my bad, I did not read that type signature closely and thought we were letting the result type vary independently of the input (like e.g. count_). Unfortunately this will be a breaking change.
The current best way to work around this is to use unsafeRetype_.
The behavior you're seeing is incredibly annoying but apparently correct and documented in postgres: https://www.postgresql.org/docs/13/functions-aggregate.html
(Apparently postgres thinks it's safe to add as many 32-bit ints in a 64-bit accumulator without overflow?)
I can think of a few options for what to do, but I'm not sure I like any of them. One possibility is to make the deserializer for the integral types be more forgiving in Postgres.
@tathougies I think the most sensible option in the short term is to make the deserializer more forgiving. This approach would trade a runtime error for degraded performance (due to not knowing which type to deserialize at compile-time). I think this is a sensible tradeoff -- regardless of whether we choose to solve it down the line by changing the return value of sum_ and avg_ (in order to know which deserializer to use at compile-time).
Reading the Postgres docs, the relevant conversions are:
- sum ( smallint ) → bigint
- sum ( integer ) → bigint
- sum ( bigint ) → numeric
- avg ( smallint ) → numeric
- avg ( integer ) → numeric
- avg ( real ) → double precision
Question: does the beam parser allow looking at the column type of a query result in order to figure out which parser to use?
If not, we could change default parsers to be more lenient. Which means that for e.g. smallint, the deserializer will first try smallint, and if that fails then bigint, and if that fails then numeric. In summary, this would require creating a series of "lenient" parsers of the following form:
smallintLenient:smallint <|> bigint <|> numericintegerLenient:integer <|> bigint <|> numericbigintLenient:bigint <|> numericrealLenient:real <|> double precision
Additionally, we should add a note for sum_ and avg_ that using unsafeRetype_ to cast the result (according to the Postgres docs) will avoid the performance penalty.
This is a duplicate of #324. But I think we should close #324 and keep this issue open since this issue contains more information and better explains where the bug comes from.
FWIW, I got this work-around working:
runSelectReturningOne $ select $ do │
aggregate_ (\u -> sum_ (cast_ (_qty u) (numeric Nothing))) $ all_ tableName
which gets translated into following with Postgres backend:
SELECT SUM(CAST(("t0"."qty") AS NUMERIC)))
In case it helps anyone, I'm employing the cast_ like so:
https://github.com/runeksvendsen/crypto-liquidity-db/blob/c85c7bff83cd18df26f5bcb777fa9ecd2090b137/src/Query/Liquidity.hs#L175