tajo icon indicating copy to clipboard operation
tajo copied to clipboard

TAJO-1925: Improve hive compatibility with TIMESTAMP partition column.

Open blrunner opened this issue 9 years ago • 3 comments

This patch contains following modifications:

  • When making partition directories, use partition name which are accepted in the format YYYY-MM-DD HH:MM:SS.MS.
  • When pruning partitions, convert UTC of EvalNode to customer timezone.
  • When building partition filter in catalog, convert UTC of algebra expression to customer timezone.
  • Add timestamp cast operation with the number of milliseconds.

And I found that the patch ran successfully with MySQLStore and HiveCatalogStore.

blrunner avatar Nov 06 '15 07:11 blrunner

I fixed some bugs and added description about timezone with hive. For the reference, I tested the patch with tpc-h 1g dataset as following:

  • on Hive
hive> CREATE TABLE timeline(event string) PARTITIONED BY (happened timestamp);

hive> insert overwrite table timeline partition(happened=1446608033000) select n_name from nation where n_nationkey = 0;

hive> insert overwrite table timeline partition(happened=1446608037000) select n_name from nation where n_nationkey = 1;

hive> insert overwrite table timeline partition(happened=854419808037) select n_name from nation where n_nationkey = 2;

hive> insert overwrite table timeline partition(happened=1446712555983) select n_name from nation where n_nationkey = 3;

hive> select * from timeline;
OK
BRAZIL  1997-01-28 11:50:08.037
ALGERIA 2015-11-04 12:33:53
ARGENTINA   2015-11-04 12:33:57
CANADA  2015-11-05 17:35:55.983
  • on Tajo with HiveCatalogStore
default> SET TIME ZONE 'Asia/Seoul';

default> select * from timeline;
event,  happened
-------------------------------
BRAZIL,  1997-01-28 11:50:08.037
ALGERIA,  2015-11-04 12:33:53
ARGENTINA,  2015-11-04 12:33:57
CANADA,  2015-11-05 17:35:55.983
(4 rows, 0.092 sec, 0 B selected)

default> select * from timeline where happened = TIMESTAMP '2015-11-04 12:33:57';
event,  happened
-------------------------------
ARGENTINA,  2015-11-04 12:33:57
(1 rows, 0.126 sec, 0 B selected)

default> insert into timeline select n_name, TIMESTAMP '2015-11-06 14:45:48.066' from nation where n_nationkey = 10;
default> insert into timeline select n_name, TIMESTAMP '2015-11-06 14:45:48.75' from nation where n_nationkey = 11;

default> select * from timeline;
event,  happened
-------------------------------
BRAZIL,  1997-01-28 11:50:08.037
ALGERIA,  2015-11-04 12:33:53
ARGENTINA,  2015-11-04 12:33:57
CANADA,  2015-11-05 17:35:55.983
IRAN,  2015-11-06 14:45:48.066
IRAQ,  2015-11-06 14:45:48.75
(6 rows, 0.087 sec, 0 B selected)
  • on Tajo with MySQLStore
default> SET TIME ZONE 'Asia/Seoul';

default> CREATE EXTERNAL TABLE default.timeline (event TEXT) USING TEXT
PARTITION BY COLUMN(happened TIMESTAMP) LOCATION 'hdfs://localhost:9010/user/hive/warehouse/timeline';

default> select * from timeline where happened = TIMESTAMP '1997-01-28 11:50:08.037';
event,  happened
-------------------------------
BRAZIL,  1997-01-28 11:50:08.037
(1 rows, 0.074 sec, 0 B selected)

default> select * from timeline where happened = TIMESTAMP '2015-11-04 12:33:57';
event,  happened
-------------------------------
ARGENTINA,  2015-11-04 12:33:57
(1 rows, 0.033 sec, 0 B selected)

default> alter table timeline repair partition;

default> insert into timeline select n_name, TIMESTAMP '2015-11-09 16:47:45.012' from nation where n_nationkey = 24;

default> select * from timeline;
event,  happened
-------------------------------
BRAZIL,  1997-01-28 11:50:08.037
ALGERIA,  2015-11-04 12:33:53
ARGENTINA,  2015-11-04 12:33:57
CANADA,  2015-11-05 17:35:55.983
IRAN,  2015-11-06 14:45:48.066
IRAQ,  2015-11-06 14:45:48.75
UNITED STATES,  2015-11-09 16:47:45.012
(7 rows, 0.036 sec, 0 B selected)

default> select * from timeline where happened = TIMESTAMP '2015-11-09 16:47:45.012';
event,  happened
-------------------------------
UNITED STATES,  2015-11-09 16:47:45.012

blrunner avatar Nov 09 '15 08:11 blrunner

Rebased. Basically, Tajo column partition focus on Hive partition compatibility. Currently, Hive just created timestamp partition directories using date format. I think that Tajo just need to follow current Hive version rule. Because even if we implement long value time for timestamp partition directory, Hive can't handle it.

blrunner avatar Jan 13 '16 03:01 blrunner

Hi @blrunner,

It has been long time. IMO, This patch needs to be included in 0.12.0 release. If you are Ok, could your rebase it? Otherwise, I can do it.

hyunsik avatar Dec 04 '17 01:12 hyunsik