spark-distcp icon indicating copy to clipboard operation
spark-distcp copied to clipboard

spark-distcp sourcefile destfile fails

Open gquintana opened this issue 3 years ago • 0 comments

When source and destination are files not directories, Spark DistCP fails:

2022-02-04 15:58:43,344 [main] INFO  org.apache.spark.scheduler.DAGScheduler - Job 1 finished: isEmpty at FileListUtils.scala:256, took 0.040579 s
Exception in thread "main" java.lang.RuntimeException: Collisions found where multiple source files lead to the same destination location; check executor logs for specific collision detail.
        at com.coxautodata.utils.FileListUtils$.handleSourceCollisions(FileListUtils.scala:258)
        at com.coxautodata.utils.FileListUtils$.getSourceFiles(FileListUtils.scala:212)
        at com.coxautodata.SparkDistCP$.run(SparkDistCP.scala:98)
        at com.coxautodata.SparkDistCP$.main(SparkDistCP.scala:52)
        at com.coxautodata.SparkDistCP.main(SparkDistCP.scala)

It fails in v0.2.5, it used to work in 0.2.2

I don't have any logs starting with The following files will collide on destination file before

gquintana avatar Feb 04 '22 15:02 gquintana