dfs-datastores
dfs-datastores copied to clipboard
add parquet support
I'd like to add parquet support in addition to sequence files.
I'd like to have this too. I haven't used Parquet yet, but it seems it would speed up most of my queries. Do you have a list of the work that needs to be done?
No, I don't have a list. I have a branch with a skeleton for it locally. The pail storage format is abstracted, so it essentially just copying what is there for sequence files except using parquet files instead.
On Tue, Apr 15, 2014 at 4:10 AM, Jeroen van Dijk [email protected]:
I'd like to have this too. I haven't used Parquet yet, but it seems it would speed up most of my queries. Do you have a list of the work that needs to be done?
— Reply to this email directly or view it on GitHubhttps://github.com/nathanmarz/dfs-datastores/issues/46#issuecomment-40469893 .
http://about.me/soren
Ok so you are saying subclassing PailFormat for Parquet like in SequenceFileFormat.java would do the trick, right?
basically, yes.
On Tue, Apr 15, 2014 at 9:09 AM, Jeroen van Dijk [email protected]:
Ok so you are saying subclassing PailFormat for Parquet like in SequenceFileFormat.javahttps://github.com/nathanmarz/dfs-datastores/blob/develop/dfs-datastores/src/main/java/com/backtype/hadoop/pail/SequenceFileFormat.javawould do the trick, right?
— Reply to this email directly or view it on GitHubhttps://github.com/nathanmarz/dfs-datastores/issues/46#issuecomment-40500998 .
http://about.me/soren
Cool, I'll give it a try soon
Hi Jeroen, were you able to get this to work? I'm looking at doing the same thing.
@caminic Sorry for the late response. No I didn't get to it. Priorities shifted and I also didn't fully see my way through all the Java indirection. Parquet support still sounds useful as it is supported by quite a number of tools these days.