velox
velox copied to clipboard
Comparisons for values of logical types are not handled correctly throughout the library
Bug description
TIMESTAMP WITH TIME ZONE logical type is backed by BIGINT physical type. The timestamp values are stored in memory as 64-bit integers using an encoding that doesn't allow for direct comparisons of these integers.
https://facebookincubator.github.io/velox/develop/timestamp.html
TimestampWithTimezone physically packs two integers in a single 64 word, using 12 bits for timezone ID, and 52 bits for a millisecond-precision timestamp.
However, may places in the core engine are applying equality and comparisons to the physical value without considering its logical semantics.
One example, is aggregation with grouping keys of type TIMESTAMP WITH TIME ZONE returns incorrect result. '2024-04-10 10:00 America/New_York' and '2024-04-10 07:00 America/Los_Angeles' represent the same timestamp, but appear as different groups in aggregation results:
auto data = makeRowVector({
makeFlatVector<std::string>({
"2024-04-10 10:00 America/New_York",
"2024-04-10 07:00 America/Los_Angeles",
}),
});
auto plan = PlanBuilder()
.values({data})
.project({"cast(c0 as timestamp with time zone)"})
.singleAggregation({"p0"}, {"count(1)"})
.project({"cast(p0 as varchar)", "a0"})
.planNode();
auto results = AssertQueryBuilder(plan).copyResults(pool());
LOG(ERROR) << results->toString();
LOG(ERROR) << results->toString(0, 10);
[ROW ROW<p0:VARCHAR,a0:BIGINT>: 2 elements, no nulls]
0: {2024-04-10 10:00:00.000 America/New_York, 1}
1: {2024-04-10 07:00:00.000 America/Los_Angeles, 1}
In Presto,
presto:di> select cast(x as timestamp with time zone), count(1) from unnest(array['2024-04-10 07:00 America/Los_Angeles', '2024-04-10 10:00 America/New_York']) as t(x) group by 1;
_col0 | _col1
---------------------------------------------+-------
2024-04-10 07:00:00.000 America/Los_Angeles | 2
(1 row)
presto:di> select x, count(1) from unnest(array['2024-04-10 07:00 America/Los_Angeles', '2024-04-10 10:00 America/New_York']) as t(x) group by 1;
x | _col1
--------------------------------------+-------
2024-04-10 07:00 America/Los_Angeles | 1
2024-04-10 10:00 America/New_York | 1
(2 rows)
All operators that perform comparisons are affected by this issue, e.g. Aggregation, Join, OrderBy.
CC: @kgpai @Yuhta @bikramSingh91 @kagamiori @amitkdutta
System information
n/a
Relevant logs
No response