spark-distcp
spark-distcp copied to clipboard
Unable to copy from S3 bucket to another S3 bucket
Hi, When I tried to use this API to copy files from S3 to S3, It is giving error. I noticed it is due to the file rename operation.
Is it possible to support S3 to S3 copy?
val srcPath = new Path("s3a://bucket1/key")
val targetPath = new Path("s3a://bucket2/key/")
val targetFs = FileSystem.get(targetPath.toUri, sparkSession.sparkContext.hadoopConfiguration)
if (!targetFs.exists(targetPath)) {
targetFs.mkdirs(targetPath)
}
SparkDistCP.run(sparkSession, Seq[Path](srcPath), targetPath,
SparkDistCPOptions(ignoreErrors = false, numListstatusThreads = 100,overwrite=true))
Error:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 21.0 failed 1 times, most recent failure: Lost task 0.0 in stage 21.0 (TID 132) (192.168.68.105 executor driver): java.nio.file.AccessDeniedException:... at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:230) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:125) at org.apache.hadoop.fs.s3a.S3AFileSystem.copyFile(S3AFileSystem.java:2541) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerRename(S3AFileSystem.java:996) at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:863) at com.coxautodata.utils.CopyUtils$.$anonfun$performCopy$4(CopyUtils.scala:380)
Thanks Sathya