pinot
pinot copied to clipboard
Add optional 5th parameter - bucketing time zone - to dateTimeConvet() function
This PR adds an optional parameter to dateTimeConvert() function - bucketing time zone. It allows specifying specific time zone to bucketing, regardless of output time zone - which could be epoch millis or a formatted string.
Examples:
select dateTimeConvert( 86400001,
'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSSZ',
'1:MILLISECONDS' )
from dual
returns 1970-01-02 00:00:00.001+0000
select dateTimeConvert( 86400001,
'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:EPOCH',
'1:DAYS',
'CET') from dual
returns 82800000
select dateTimeConvert( 86400001,
'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSSZ',
'1:DAYS',
'CET')
from dual
returns 1970-01-01 23:00:00.000+0000 because 1970-01-02 00:00:00.001+0000 is rounded to 1970-01-02 00:00:00.001+0200 in CET time zone and then converted to UTC.
select dateTimeConvert( 86400001,
'1:MILLISECONDS:EPOCH',
'1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS tz(America/Los_Angeles)',
'1:DAYS',
'CET')
from dual
returns 1970-01-01 15:00:00.000
Note: when bucketing time zone is set, bucketing is performed relative to bigger time unit (e.g. 1 day when bucketing to 1 hour ) and not epoch start.
Fixes #8581
I ran the following queries with BenchmarkQueries class, 15 mil rows: A. using new timezone parameter
SELECT *
FROM MyTable
WHERE dateTimeConvert(TSTMP_COL, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:DAYS', 'CET') = 120000000
B. workaround
SELECT *
FROM MyTable
WHERE FromDateTime(dateTimeConvert(TSTMP_COL, '1:MILLISECONDS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSSZ tz(CET)', '1:DAYS'), 'yyyy-MM-dd HH:mm:ss.SSSZ') = 120000000
that shows new implementation is about 30 times faster than workaround:
Benchmark (_numRows) (_query) (_scenario) Mode Cnt Score Error Units BenchmarkQueries.query 15000000 A EXP(0.001) avgt 5 479.288 ± 162.748 ms/op BenchmarkQueries.query 15000000 A EXP(0.5) avgt 5 492.584 ± 94.075 ms/op BenchmarkQueries.query 15000000 A EXP(0.999) avgt 5 496.491 ± 156.267 ms/op
vs
Benchmark (_numRows) (_query) (_scenario) Mode Cnt Score Error Units BenchmarkQueries.query 15000000 B EXP(0.001) avgt 5 15010.526 ± 10.270 ms/op BenchmarkQueries.query 15000000 B EXP(0.5) avgt 5 14991.342 ± 112.595 ms/op BenchmarkQueries.query 15000000 B EXP(0.999) avgt 5 14519.129 ± 490.620 ms/op
Also - changed TimePredicateFilterOptimizer to ignore calls (similarly to those with SDF output) to dateTimeConvert with 5 arguments, instead of failing queries due to unexpected number of function arguments.
Codecov Report
Attention: Patch coverage is 85.78947% with 27 lines in your changes missing coverage. Please review.
Project coverage is 63.77%. Comparing base (
59551e4) to head (7614427). Report is 1265 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #14298 +/- ##
============================================
+ Coverage 61.75% 63.77% +2.02%
- Complexity 207 1556 +1349
============================================
Files 2436 2660 +224
Lines 133233 145813 +12580
Branches 20636 22311 +1675
============================================
+ Hits 82274 92998 +10724
- Misses 44911 45955 +1044
- Partials 6048 6860 +812
| Flag | Coverage Δ | |
|---|---|---|
| custom-integration1 | 100.00% <ø> (+99.99%) |
:arrow_up: |
| integration | 100.00% <ø> (+99.99%) |
:arrow_up: |
| integration1 | 100.00% <ø> (+99.99%) |
:arrow_up: |
| integration2 | 0.00% <ø> (ø) |
|
| java-11 | 63.76% <85.78%> (+2.05%) |
:arrow_up: |
| java-21 | 63.66% <85.78%> (+2.04%) |
:arrow_up: |
| skip-bytebuffers-false | 63.77% <85.78%> (+2.02%) |
:arrow_up: |
| skip-bytebuffers-true | 63.64% <85.78%> (+35.91%) |
:arrow_up: |
| temurin | 63.77% <85.78%> (+2.02%) |
:arrow_up: |
| unittests | 63.77% <85.78%> (+2.02%) |
:arrow_up: |
| unittests1 | 55.47% <85.78%> (+8.58%) |
:arrow_up: |
| unittests2 | 34.16% <0.52%> (+6.43%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Can you elaborate more on this:
Note: when bucketing time zone is set, bucketing is performed relative to bigger time unit (e.g. 1 day when bucketing to 1 hour ) and not epoch start.
Sure, when bucketing time zone is used then it behaves more like with SDF output before the change - e.g. date is truncated to 5 hours within the day and not to 5 hours counting hours since start of epoch .