dfs-datastores icon indicating copy to clipboard operation
dfs-datastores copied to clipboard

add parquet support

Open sorenmacbeth opened this issue 10 years ago • 7 comments

I'd like to add parquet support in addition to sequence files.

sorenmacbeth avatar Apr 02 '14 05:04 sorenmacbeth

I'd like to have this too. I haven't used Parquet yet, but it seems it would speed up most of my queries. Do you have a list of the work that needs to be done?

jeroenvandijk avatar Apr 15 '14 11:04 jeroenvandijk

No, I don't have a list. I have a branch with a skeleton for it locally. The pail storage format is abstracted, so it essentially just copying what is there for sequence files except using parquet files instead.

On Tue, Apr 15, 2014 at 4:10 AM, Jeroen van Dijk [email protected]:

I'd like to have this too. I haven't used Parquet yet, but it seems it would speed up most of my queries. Do you have a list of the work that needs to be done?

— Reply to this email directly or view it on GitHubhttps://github.com/nathanmarz/dfs-datastores/issues/46#issuecomment-40469893 .

http://about.me/soren

sorenmacbeth avatar Apr 15 '14 14:04 sorenmacbeth

Ok so you are saying subclassing PailFormat for Parquet like in SequenceFileFormat.java would do the trick, right?

jeroenvandijk avatar Apr 15 '14 16:04 jeroenvandijk

basically, yes.

On Tue, Apr 15, 2014 at 9:09 AM, Jeroen van Dijk [email protected]:

Ok so you are saying subclassing PailFormat for Parquet like in SequenceFileFormat.javahttps://github.com/nathanmarz/dfs-datastores/blob/develop/dfs-datastores/src/main/java/com/backtype/hadoop/pail/SequenceFileFormat.javawould do the trick, right?

— Reply to this email directly or view it on GitHubhttps://github.com/nathanmarz/dfs-datastores/issues/46#issuecomment-40500998 .

http://about.me/soren

sorenmacbeth avatar Apr 15 '14 16:04 sorenmacbeth

Cool, I'll give it a try soon

jeroenvandijk avatar Apr 15 '14 19:04 jeroenvandijk

Hi Jeroen, were you able to get this to work? I'm looking at doing the same thing.

caminic avatar Nov 23 '16 11:11 caminic

@caminic Sorry for the late response. No I didn't get to it. Priorities shifted and I also didn't fully see my way through all the Java indirection. Parquet support still sounds useful as it is supported by quite a number of tools these days.

jeroenvandijk avatar Dec 08 '16 22:12 jeroenvandijk