amoro icon indicating copy to clipboard operation
amoro copied to clipboard

[Bug]: AMS unexpectedly exits after a long full GC

Open link3280 opened this issue 1 year ago • 0 comments

What happened?

If the embedded Spark local terminal is enabled, the AMS process may exit if the Spark executor's heartbeat times out with the driver, which could happen if a minute-long full gc happens.

The related logs are as following:

ERROR [executor-heartbeater] [org.apache.spark.executor.Executor] [] - Exit as unable to send heartbeats to driver more than 60 times

Affects Versions

0.5.1

What table formats are you seeing the problem on?

Iceberg

What engines are you seeing the problem on?

AMS

How to reproduce

No response

Relevant log output

No response

Anything else

No response

Are you willing to submit a PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

link3280 avatar Jul 18 '24 06:07 link3280