elephant-bird icon indicating copy to clipboard operation
elephant-bird copied to clipboard

provide an option for Lzo inputformat not to read index file (or read remotely)

Open rangadi opened this issue 12 years ago • 0 comments

The index for a an lzo file is read on the client while making the splits. for large inputs, this takes very long since the file are read serially.

Some times users may not need to split the file (say, there are already lots of files), a simple option to disable readin the index might be good enough.

Another option is to read the index on the remote tasks. Each record reader adjusts its split based on the the index.

rangadi avatar Oct 30 '12 20:10 rangadi