dynamometer
dynamometer copied to clipboard
Blockgen job fails to clean up failed reduce attempts
The block generation job has custom output logic to allow each reducer to output to multiple block files.
When speculative execution is enabled, this can result in two copies of the same block file being generated (one of which may be incomplete). This can be worked around by setting mapreduce.reduce.speculative = false.
When a reducer attempt fails, the partial output files will not be cleaned up. I'm not aware of an easy workaround for this beyond manually cleaning up the files after the job completes.
We should have each reducer use a staging directory and only move the output files when it completes.