HadooPHP
HadooPHP copied to clipboard
A framework for writing Hadoop Streaming jobs in PHP
Prevents dreadful parse errors that you'll only find when running things in Hadoop
It's easy to miss a backslash at the end of a line or the space in front of the backslash... Just annoying to debug.
Maybe with counters... keep track of memory usage, and if it keeps going up in reducers, emit warnings about possible buffering in user code. Win.
https://issues.apache.org/jira/browse/HADOOP-1722 (backported in CDH3) is for typed bytes; I'm pretty sure I saw a raw bytes format in the source too.
The functionality in CDH3 is backported from 0.22, see https://issues.apache.org/jira/browse/MAPREDUCE-1785 Check Mapper to see if we need to do more detection (if that is even possible...)