dvryaboy comments

Results 37 comments of


                                            dvryaboy

uncomment the "register" line in the pig script

You can register '/dev/null', seems to work ok :). D On Fri, Aug 31, 2012 at 8:49 AM, Travis Crawford [email protected]: > Since the unit test has the UDF class...

how to create index for field of sub-struct in thrift

That's exactly how you refer to it: stuff = load ...; filtered_stuff = filter stuff by a.a1 == '1234'; Does this not work? Could you post the script you are...

how to create index for field of sub-struct in thrift

Ah I see what's happening. I think this is a Pig bug -- it needs to push down the filter, but nested relations confuse it. I don't see any reason...

Docs/perf best practices update

@talSofer and @itaiad200 , thank you for the feedback! I incorporated a few of your suggestions and explained my motivation/reasoning for a couple of the things you pushed back on....

Compress indexes

Rewrote LzoTinyOffsets to use VarInt implementation from Mahout, and got rid of numBlocks() method in the interface. Tests pass, still haven't tested on real data.

@sjlee check out this ancient pull request. The goal here is to make lzo indexes significantly smaller, making split calculation, etc, much faster. It's meant to be backwards-compatible (new hadoop-lzo...

Options to skip small files and not recurse on input paths

This is odd -- I swear we did this years ago. @rangadi do you remember what the deal is? Is this something we put into EB instead of hadoop-lzo?

Smaller lzo indexes: make #43 apply to current codebase

ugh MacOS is fighting git on file capitalization.. one sec.

Smaller lzo indexes: make #43 apply to current codebase

Ok the test failures appear unrelated: ``` SEVERE: Could not load native gpl library java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path ``` The new tests actually pass.

Avoid creating LZO indexes on files not spread on several blocs

@rangadi don't we already skip index creation somewhere? I know we don't create them for small files (don't recall if small == block size).