spark
spark copied to clipboard
[SPARK-40469][CORE] Avoid creating directory failures
What changes were proposed in this pull request?
This PR replace Files.createDirectory
with Files.createDirectories
.
Why are the changes needed?
To avoid creating directory failures if the parent directory removed by YARN:
java.nio.file.NoSuchFileException: /hadoop/3/yarn/local/usercache/<User Name>/appcache/application_1654776504115_37917/blockmgr-e18b484f-8c49-4c7d-b649-710439b0e4c3/3c
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
at java.nio.file.Files.createDirectory(Files.java:674)
at org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:123)
at org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:146)
at org.apache.spark.storage.DiskStore.contains(DiskStore.scala:147)
at org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:853)
at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBroadcastBlock$4(TorrentBroadcast.scala:253)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBroadcastBlock$2(TorrentBroadcast.scala:250)
at org.apache.spark.util.KeyLock.withLock(KeyLock.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBroadcastBlock$1(TorrentBroadcast.scala:245)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1383)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:245)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:109)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:86)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:132)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:487)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1417)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:490)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Manually test.
Parent directory would not be typically removed unless we are shutting down the container (for ex) - and in those cases usually, we should not be recreating the parent dirs.
Under what circumstance did we observe this ?
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!