nutch
nutch copied to clipboard
Improvements in Hadoop's s3a output committers obsolete class S3FileOutputFormat
The Fetcher in this Nutch fork (Common Crawl) uses the S3FileOutputFormat to overcome issues with the default FileOutputCommitter. These issues have been addressed in Hadoop 3.1.x and upwards (see S3A Committers documentation) and should have made the S3FileOutputFormat obsolete.
Update: S3 is now consistent (1, 2) which allows to use the MagicCommitter without S3Guard. See also HADOOP-17483.