parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

Add support for Pig datetimes

Open asfimport opened this issue 11 years ago • 15 comments

There's currenly no support for conversion to/from Pig datetimes

Reporter: Christian Rolf / @ccrolf

Related issues:

Original Issue Attachments:

Note: This issue was originally created as PARQUET-137. Please see the migration documentation for further details.

asfimport avatar Nov 20 '14 10:11 asfimport

Brock Noland / @brockn: This uses the NanoTime class. We should probably fix that implementation as it's wrong. See PARQUET-114.

asfimport avatar Nov 20 '14 17:11 asfimport

Brock Noland / @brockn: Hi @ccrolf,

Long story short:

  1. The NanoTime class was implemented incorrect as described in PARQUET-114.
  2. As noted by the package name on the class this was implemented as an example. Users are expected to implement their own class.

Brock

asfimport avatar Nov 20 '14 17:11 asfimport

Ryan Blue / @rdblue: Thanks for looking at this, Christian! Brock is right that NanoTime is for demo purposes only. In fact, I wouldn't recommend building your own copy of it either because the "timestamp" it works with is undocumented and uses an int96 without an annotation. We've been looking at this problem lately and we have defined both type annotations and specified how they should be interpreted. The next step is to implement those types in the object models like you've done here. In fact, this will be the first implementation.

The specification for date/time types is on the LogicalTypes page. If you need any help with the spec, feel free to ask questions and I'll clarify.

asfimport avatar Nov 20 '14 22:11 asfimport

Christian Rolf / @ccrolf: Thanks for the feedback! Sorry I didn't have time to look into this further for a long time. Looks like Parquet format 2 will have totally different date types. So there isn't much point in fixing this?

asfimport avatar Apr 29 '15 08:04 asfimport

Ryan Blue / @rdblue: Yeah, there's definitely value in making Pig work with the dates and times from the spec. Does Pig have date and time types as well?

asfimport avatar Apr 29 '15 16:04 asfimport

Christian Rolf / @ccrolf: Ok, will try to find time for it, Pig uses Joda time internally: http://pig.apache.org/docs/r0.14.0/basic.html#data-types

asfimport avatar May 05 '15 19:05 asfimport

Oleksiy Sayankin: Fixed joda-time scope and version in dependences.

asfimport avatar Oct 21 '16 11:10 asfimport

Oleksiy Sayankin: Tested fix with Pig and Hive

STEP 1: Create parquet data in Hive

CREATE TABLE IF NOT EXISTS `test` (id int);
CREATE External TABLE `pig` (
  `campaignid` bigint,
  `siteid` bigint,
  `name` string,
  `lastupdated` timestamp,
  `created` timestamp,
  `active` boolean
) STORED AS PARQUET LOCATION '/user/test/pig';

Insert data.

INSERT OVERWRITE TABLE `test` VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
INSERT OVERWRITE TABLE `pig`
SELECT
  1,
  2,
  'sample',
  '2016-10-17 11:22:33.232323434',
  '2016-10-17 11:22:33.232323434',
  1
FROM `test`
LIMIT 10;

STEP 2. Load the data using pig:

REGISTER /usr/pig/pig-0.16/contrib/piggybank/java/piggybank.jar;
parqData = LOAD '/user/test/pig/000000_0' USING parquet.pig.ParquetLoader('campaignid:long,siteid:long,name:chararray,lastupdated:datetime,created:datetime,active:boolean');
DUMP parqData;

EXPECTED RESULT:

(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)
(1,2,sample,2016-10-17T20:22:33.232Z,2016-10-17T20:22:33.232Z,true)

Worked as expected.

asfimport avatar Oct 21 '16 11:10 asfimport

Oleksiy Sayankin: Hi all!

Can any body review the patch and apply it? Our customer is suffering...

Thanks in advance.

asfimport avatar Nov 28 '16 18:11 asfimport

Ryan Blue / @rdblue: Thanks, [~osayankin]! I didn't realize there was a patch to review here. We'll take a look.

Could you open a pull request on github for this?

asfimport avatar Nov 28 '16 19:11 asfimport

Oleksiy Sayankin: Hi @rdblue.

I am not a contributor at https://github.com/Parquet/parquet-mr so I can not create a separate branch and hence a pull request for merge: not enough permissions.

Well, either I need to get permissions to create a new branch or ask some one who has ones to create it for me and apply changes from the patch.

asfimport avatar Nov 29 '16 13:11 asfimport

Oleksiy Sayankin: PS: I expected that patch will be applied automatically if it is well formatted. I waited something like this

PARQUET-<JIRA-NUMBER>[.][-].patch

asfimport avatar Nov 29 '16 14:11 asfimport

Ryan Blue / @rdblue: You can open a pull request from your own repository. Just push a branch to your github fork and open a PR for it from there.

You may want to make sure you forked from https://github.com/apache/parquet-mr so you don't have to select that one manually. We no longer use the old repository.

asfimport avatar Nov 29 '16 16:11 asfimport

Oleksiy Sayankin: Done: https://github.com/apache/parquet-mr/pull/387

asfimport avatar Nov 30 '16 12:11 asfimport

Viraj Bhat: [~osayankin] @rdblue is the patch being re-submitted after comments in github. Is this being fixed elsewhere that I do not know of? Viraj

asfimport avatar Apr 05 '17 20:04 asfimport