embucket-labs icon indicating copy to clipboard operation
embucket-labs copied to clipboard

Align aggregate return type with Snowflake

Open rampage644 opened this issue 7 months ago • 2 comments

statement ok
CREATE OR REPLACE TABLE sum_example(k INT, d DECIMAL(10,5),
                                    s1 VARCHAR(10), s2 VARCHAR(10));

statement ok
INSERT INTO sum_example VALUES
  (1, 1.1, '1.1','one'),
  (1, 10, '10','ten'),
  (2, 2.2, '2.2','two'),
  (2, null, null,'null'),
  (3, null, null, 'null'),
  (null, 9, '9.9','nine');

statement ok
SELECT * FROM sum_example;

query TT
SELECT SUM(d), SUM(CAST(s1 AS DECIMAL(10,5))) FROM sum_example
----
22.30000	23.20000

returns incorrectly formatted numbers:

================================================================================
Expected result:
22.30000	23.20000
================================================================================
Actual result:
22.300000000000000710542735760100185871124267578125	23.199999999999999289457264239899814128875732421875

rampage644 avatar Apr 30 '25 20:04 rampage644

int is a syntax sugar for number(38, 0) in snowflake and Embucket seems to understand int as DataFusion's int and this is not a correct behavior and same problem with decimal and etc. all numeric data types.

we need a serious data types overhaul #741

camuel avatar May 12 '25 19:05 camuel

Upon investigating the issue more closely it seems @camuel was right this is data type related.

The corollary is we need more tests for logical expressions and function type coercion.

This issue might need an addition of an AnalyzerRule to pin desired behaviour OR figure out it is not possible and we need more deep datafusion change. Logical types perhaps?

rampage644 avatar Jun 15 '25 01:06 rampage644