nutch icon indicating copy to clipboard operation
nutch copied to clipboard

Improvements in Hadoop's s3a output committers obsolete class S3FileOutputFormat

Open sebastian-nagel opened this issue 5 years ago • 1 comments

The Fetcher in this Nutch fork (Common Crawl) uses the S3FileOutputFormat to overcome issues with the default FileOutputCommitter. These issues have been addressed in Hadoop 3.1.x and upwards (see S3A Committers documentation) and should have made the S3FileOutputFormat obsolete.

sebastian-nagel avatar Nov 22 '19 10:11 sebastian-nagel

Update: S3 is now consistent (1, 2) which allows to use the MagicCommitter without S3Guard. See also HADOOP-17483.

sebastian-nagel avatar Nov 03 '21 10:11 sebastian-nagel