pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Add optional 5th parameter - bucketing time zone - to dateTimeConvet() function

Open bziobrowski opened this issue 1 year ago • 1 comments

This PR adds an optional parameter to dateTimeConvert() function - bucketing time zone. It allows specifying specific time zone to bucketing, regardless of output time zone - which could be epoch millis or a formatted string.

Examples:

select dateTimeConvert( 86400001, 
                        '1:MILLISECONDS:EPOCH', 
                        '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSSZ',
                        '1:MILLISECONDS' ) 
from dual

returns 1970-01-02 00:00:00.001+0000

select dateTimeConvert( 86400001,
                        '1:MILLISECONDS:EPOCH', 
                        '1:MILLISECONDS:EPOCH', 
                        '1:DAYS', 
                        'CET') from dual

returns 82800000

select dateTimeConvert( 86400001, 
                        '1:MILLISECONDS:EPOCH', 
                        '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSSZ', 
                        '1:DAYS', 
                        'CET') 
from dual 

returns 1970-01-01 23:00:00.000+0000 because 1970-01-02 00:00:00.001+0000 is rounded to 1970-01-02 00:00:00.001+0200 in CET time zone and then converted to UTC.

select dateTimeConvert( 86400001, 
                        '1:MILLISECONDS:EPOCH', 
                        '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS tz(America/Los_Angeles)', 
                        '1:DAYS', 
                        'CET') 
from dual

returns 1970-01-01 15:00:00.000

Note: when bucketing time zone is set, bucketing is performed relative to bigger time unit (e.g. 1 day when bucketing to 1 hour ) and not epoch start.

Fixes #8581

I ran the following queries with BenchmarkQueries class, 15 mil rows: A. using new timezone parameter

SELECT * 
FROM MyTable 
WHERE  dateTimeConvert(TSTMP_COL, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:DAYS', 'CET') = 120000000

B. workaround

SELECT * 
FROM MyTable 
WHERE FromDateTime(dateTimeConvert(TSTMP_COL, '1:MILLISECONDS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSSZ tz(CET)', '1:DAYS'), 'yyyy-MM-dd HH:mm:ss.SSSZ') = 120000000

that shows new implementation is about 30 times faster than workaround: Benchmark (_numRows) (_query) (_scenario) Mode Cnt Score Error Units BenchmarkQueries.query 15000000 A EXP(0.001) avgt 5 479.288 ± 162.748 ms/op BenchmarkQueries.query 15000000 A EXP(0.5) avgt 5 492.584 ± 94.075 ms/op BenchmarkQueries.query 15000000 A EXP(0.999) avgt 5 496.491 ± 156.267 ms/op vs Benchmark (_numRows) (_query) (_scenario) Mode Cnt Score Error Units BenchmarkQueries.query 15000000 B EXP(0.001) avgt 5 15010.526 ± 10.270 ms/op BenchmarkQueries.query 15000000 B EXP(0.5) avgt 5 14991.342 ± 112.595 ms/op BenchmarkQueries.query 15000000 B EXP(0.999) avgt 5 14519.129 ± 490.620 ms/op

Also - changed TimePredicateFilterOptimizer to ignore calls (similarly to those with SDF output) to dateTimeConvert with 5 arguments, instead of failing queries due to unexpected number of function arguments.

bziobrowski avatar Oct 24 '24 18:10 bziobrowski

Codecov Report

Attention: Patch coverage is 85.78947% with 27 lines in your changes missing coverage. Please review.

Project coverage is 63.77%. Comparing base (59551e4) to head (7614427). Report is 1265 commits behind head on master.

Files with missing lines Patch % Lines
.../transformer/datetime/BaseDateTimeTransformer.java 73.86% 21 Missing and 2 partials :warning:
.../pinot/common/function/scalar/DateTimeConvert.java 91.83% 2 Missing and 2 partials :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14298      +/-   ##
============================================
+ Coverage     61.75%   63.77%   +2.02%     
- Complexity      207     1556    +1349     
============================================
  Files          2436     2660     +224     
  Lines        133233   145813   +12580     
  Branches      20636    22311    +1675     
============================================
+ Hits          82274    92998   +10724     
- Misses        44911    45955    +1044     
- Partials       6048     6860     +812     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) :arrow_up:
integration 100.00% <ø> (+99.99%) :arrow_up:
integration1 100.00% <ø> (+99.99%) :arrow_up:
integration2 0.00% <ø> (ø)
java-11 63.76% <85.78%> (+2.05%) :arrow_up:
java-21 63.66% <85.78%> (+2.04%) :arrow_up:
skip-bytebuffers-false 63.77% <85.78%> (+2.02%) :arrow_up:
skip-bytebuffers-true 63.64% <85.78%> (+35.91%) :arrow_up:
temurin 63.77% <85.78%> (+2.02%) :arrow_up:
unittests 63.77% <85.78%> (+2.02%) :arrow_up:
unittests1 55.47% <85.78%> (+8.58%) :arrow_up:
unittests2 34.16% <0.52%> (+6.43%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Oct 24 '24 19:10 codecov-commenter

Can you elaborate more on this:

Note: when bucketing time zone is set, bucketing is performed relative to bigger time unit (e.g. 1 day when bucketing to 1 hour ) and not epoch start.

Jackie-Jiang avatar Oct 28 '24 22:10 Jackie-Jiang

Sure, when bucketing time zone is used then it behaves more like with SDF output before the change - e.g. date is truncated to 5 hours within the day and not to 5 hours counting hours since start of epoch .

bziobrowski avatar Oct 28 '24 22:10 bziobrowski