fakesnow icon indicating copy to clipboard operation
fakesnow copied to clipboard

Wrong numeric type returned

Open sk- opened this issue 9 months ago • 4 comments

We have been trying to adopt fakesnow in our project to avoid having to run all tests against a live Snowflake instance. In this process we found an issue related to how numeric types are returned. Our code was expecting ints but was getting back a decimal.Decimal.

I narrowed down this to the following info:

  • SUM in duckdb returns a HUGEINT (int128) see playground
  • HUGEINTS do not seem to be converted properly in fakesnow
  • snowflake connector has logic to convert to int

Outputs with fakesnow:

SELECT SUM(x) FROM (SELECT 1 as x)

returns a decimal.Decimal in python

SELECT SUM(x)::BIGINT FROM (SELECT 1 as x)

returns an int in python

sk- avatar Mar 20 '25 18:03 sk-

Nice analysis. The same logic you found in python for converting to int when scale=0 is also present in the C++ arrow reader.

A comprehensive solution would be to run the C++ snowflake connector logic over the arrow table coming from duckdb.

One idea would be to cast SUM to bigint, but that won't work when the values are doubles.

FYI - we've had to fix integer_precision before in #12 and #193.

tekumara avatar Mar 21 '25 07:03 tekumara

The arrow type being returned by a Snowflake instance here is pyarrow.lib.Int64Array, which is before the snowflake connector does any transformation.

❯ pbpaste | python -m tools.decode
Total fields: 1
==================================================
Field 0: SUM(X)
  Type: int64
  Nullable: True
  Metadata:
    logicalType: FIXED
    precision: 13
    scale: 0
    charLength: 0
    byteLength: 8
    finalType: T

  Batch data: [
  1
]
  Batch data type: <class 'pyarrow.lib.Int64Array'>
==================================================

At the moment fakesnow server is returning pyarrow.lib.Decimal128Array which the snowflake connector logic transforms into a python int. So if you use the fakesnow server you won't have this problem.

Nonetheless, we need to fix non-server mode, and should probably return pyarrow.lib.Int64Array from fakesnow for correctness.

tekumara avatar Mar 21 '25 07:03 tekumara

Thanks for the followup.

What's your position regarding server mode vs pytest mode? (We could discuss it in other issue if you prefer)

Long term I'd prefer to use the server (via a docker image), as that would allow to use it both in tests and local development. Also in tests that allow to exercise the snowflake-connector as well, and introspect the DB when running tests.

sk- avatar Mar 21 '25 12:03 sk-

A docker image is a priority - feel free to create an issue for that.

The server version is at about 90% parity with the non-server (ie: patching) version. One of the main gaps is it doesn't support write_pandas yet. The goal is to get it to parity.

tekumara avatar Mar 21 '25 23:03 tekumara