hadoop-lzo-packager icon indicating copy to clipboard operation
hadoop-lzo-packager copied to clipboard

Packaging utilities for GPL compression libraries in Hadoop

Overview

This project is a convenient mechanism of packaging the hadoop-gpl-compression project from Google Code.

It has three basic steps:

  1. Perform an svn export of the most recent revision of hadoop-gpl-compression
  2. Create and build an RPM
  3. Create an build a Debian package

Requirements:

  • subversion
  • java (preferably sun's JDK)
    • JAVA_HOME must be set in your environment
  • appropriate package building tools and lzo libs for your platform
    • yum install rpm-build lzo-devel (RedHat based)
    • apt-get install dev-scripts liblzo2-dev (Debian based)
  • ant version 1.7.0 or greater (RedHat will require some fiddling[1])

When you try to build for your platform, build dependency errors will also inform you of any other packages you may need to install (eg lzo2 devel packages, ant, etc)

[1]: First you'll need to install "ant" and "ant-nodeps" with yum. Then You'll need to download the binary Apache Ant distribution from their website, and extract the tarball somewhere. You'll have to set ANT_HOME in your environment to point to the newly archived directory. You'll also have to put $ANT_HOME/bin in your path, before /usr/bin. Running "ant" on the command line should run $ANT_HOME/bin/ant, not /usr/bin/ant.

Usage:

To build packages, simply run the included shell script.

./run.sh

We recommend you run this on the same platform as your tasktrackers so as to be sure the built libraries are compatible.

Various options are available, to get help do:

./run.sh -h

If you would like to skip building debian or rpm, you can do:

./run.sh --no-rpm or ./run.sh --no-deb

If you'd like to check out a particular revision, you can do:

./run.sh --svn-rev=46

If the downloads fail because of certificate problems, you can do:

WGET_OPTS=--no-check-certificate ./run.sh

If the build fails and you find a file build/master then you have a version of wget which does not use the filename from the redirected URL. You can work around it with:

WGET_OPTS=--trust-server-names=on ./run.sh

Or with both options:

WGET_OPTS="--no-check-certificate --trust-server-names=on" ./run.sh

There are some other variables that can be overridden - simply look at the top section of run.sh to learn what they are.

After running the script, you should be able to find debs in the build/deb directory and RPMs in the build/topdir/RPMS directory.

Contributing

To contribute to this project, please clone its repository from http://github.com/toddlipcon/hadoop-lzo-packager/ and commit patches to your github repository. When you would like to submit your contribution for inclusion, send a Pull Request to the Cloudera repository.