spark-distcp
spark-distcp copied to clipboard
spark-distcp sourcefile destfile fails
When source and destination are files not directories, Spark DistCP fails:
2022-02-04 15:58:43,344 [main] INFO org.apache.spark.scheduler.DAGScheduler - Job 1 finished: isEmpty at FileListUtils.scala:256, took 0.040579 s
Exception in thread "main" java.lang.RuntimeException: Collisions found where multiple source files lead to the same destination location; check executor logs for specific collision detail.
at com.coxautodata.utils.FileListUtils$.handleSourceCollisions(FileListUtils.scala:258)
at com.coxautodata.utils.FileListUtils$.getSourceFiles(FileListUtils.scala:212)
at com.coxautodata.SparkDistCP$.run(SparkDistCP.scala:98)
at com.coxautodata.SparkDistCP$.main(SparkDistCP.scala:52)
at com.coxautodata.SparkDistCP.main(SparkDistCP.scala)
It fails in v0.2.5, it used to work in 0.2.2
I don't have any logs starting with The following files will collide on destination file
before