spark-distcp icon indicating copy to clipboard operation
spark-distcp copied to clipboard

Unable to copy from S3 bucket to another S3 bucket

Open SathyaDhana opened this issue 3 years ago • 0 comments

Hi, When I tried to use this API to copy files from S3 to S3, It is giving error. I noticed it is due to the file rename operation.

Is it possible to support S3 to S3 copy?

    val srcPath = new Path("s3a://bucket1/key")
    val targetPath = new Path("s3a://bucket2/key/")

    val targetFs = FileSystem.get(targetPath.toUri, sparkSession.sparkContext.hadoopConfiguration)
    if (!targetFs.exists(targetPath)) {
      targetFs.mkdirs(targetPath)
    }

    SparkDistCP.run(sparkSession, Seq[Path](srcPath), targetPath,
      SparkDistCPOptions(ignoreErrors = false, numListstatusThreads = 100,overwrite=true))

Error:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 21.0 failed 1 times, most recent failure: Lost task 0.0 in stage 21.0 (TID 132) (192.168.68.105 executor driver): java.nio.file.AccessDeniedException:... at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:230) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:125) at org.apache.hadoop.fs.s3a.S3AFileSystem.copyFile(S3AFileSystem.java:2541) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerRename(S3AFileSystem.java:996) at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:863) at com.coxautodata.utils.CopyUtils$.$anonfun$performCopy$4(CopyUtils.scala:380)

Thanks Sathya

SathyaDhana avatar Feb 11 '22 23:02 SathyaDhana