Kai-Hsun Chen
Kai-Hsun Chen
cc. @Ngone51 @jiangxb1987
Hi @mridulm, # Task Failure vs Non-Task Failure We can categorize executor loss reasons into two categories. (1) Task Failure: The network is good, but the task causes the executor's...
Hi @mridulm, Take a case in Databricks as an example, we have observed that all the sockets for executor connections on driver are closed by unknown reasons. Hence, driver will...
Thank @mridulm and @Ngone51 for the review!
cc. @Ngone51 @jiangxb1987
Thank @mridulm for your recommendations! I will resolve these comments as soon as possible.
> Btw, any thoughts on this ? > > > Are the changes here necessarily only for standalone ? Why not k8s and yarn ? > > The changes are...
Hi @mridulm, here are the JIRA tickets. Thank you! YARN: https://issues.apache.org/jira/browse/SPARK-40068 k8s: https://issues.apache.org/jira/browse/SPARK-40069
Gentle ping @Ngone51 @mridulm
https://github.com/apache/spark/pull/37411/commits/92629e30410d7ae9741457240c3f1a789f6b042b # Default values I have revisited the configurations in this PR and updated their default values after the discussion with @Ngone51. (1) initial delay: set initial delay to `executorTimeoutMs`...