barefoot
barefoot copied to clipboard
Prevent stack overflow during KState remove
During map-matching, I was seeing occassional failures due to StackOverflow:
17/06/23 04:02:11 WARN TaskSetManager: Lost task 1351.0 in stage 3.0 (TID 12713, 172.17.20.7): org.apache.spark.SparkException: Task failed while writing rows
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:269)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.StackOverflowError
at java.util.HashMap.removeNode(HashMap.java:846)
at java.util.HashMap.remove(HashMap.java:798)
at java.util.HashSet.remove(HashSet.java:235)
at com.bmwcarit.barefoot.markov.KState.remove(KState.java:239)
at com.bmwcarit.barefoot.markov.KState.remove(KState.java:245)
at com.bmwcarit.barefoot.markov.KState.remove(KState.java:245)
at com.bmwcarit.barefoot.markov.KState.remove(KState.java:245)
...
This appears to be due to the recursive form of the KState.remove
method. This PR rewrites the code to be non-recursive, to stop this error occurring.
I also set a longer timeout for the server tests, so that they work in my environment. Although not directly connected, it allows all tests to pass in slower environments.