spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Open Resol1992 opened this issue 2 years ago • 8 comments

What changes were proposed in this pull request?

Fixed memory leak caused by not clearing the FileStatusCache created by calling FileStatusCache.getOrCreate when close a SparkSession of Spark Thrift Server.

Why are the changes needed?

When create many sessions in Spark Thrift Server and do querys in each session, the fileStatus of files related will be cached. But these filestatus will not be cleared when these sessions are closed, the memory leak causes Driver OOM.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Tested by the steps in SPARK-39866.

Resol1992 avatar Aug 04 '22 06:08 Resol1992

jenkins, test this please

wzhfy avatar Aug 04 '22 07:08 wzhfy

jenkins, test this please

no jenkins now :)

LuciferYang avatar Aug 04 '22 15:08 LuciferYang

Different sessions can share this cache. Maybe you can set spark.sql.metadataCacheTTLSeconds to a positive value to workaround this issue?

@wangyum IIUC different sessions' file status cache is separated, but they share memory, so clean file status when closing session asap is also necessary: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala#L157

Besides, TTL seconds is hard to set, as a small value will hurt performance while big value will cause the same memory issue.

wzhfy avatar Aug 05 '22 03:08 wzhfy

jenkins, test this please

no jenkins now :)

@LuciferYang thanks for reminding :) @Resol1992 maybe you need to check your github action settings?

wzhfy avatar Aug 05 '22 03:08 wzhfy

This is a very good fix when using multiple connections to query the same table and then closing the connections, I don't think relying on TTL is enough.

Zhangshunyu avatar Aug 05 '22 07:08 Zhangshunyu

This is a very good fix when using multiple connections to query the same table and then closing the connections, I don't think relying on TTL is enough.

Zhangshunyu avatar Aug 05 '22 07:08 Zhangshunyu

Different sessions can share this cache. Maybe you can set spark.sql.metadataCacheTTLSeconds to a positive value to workaround this issue?

@wangyum Thanks for your advice. I have tried to workaroud this with setting spark.sql.metadataCacheTTLSeconds = 10, but it does not work, the fileStatus objects still exist after sparkSession is closed.

Resol1992 avatar Aug 05 '22 09:08 Resol1992

cc @yaooqinn @cloud-fan , could you also take a look?

wzhfy avatar Aug 09 '22 11:08 wzhfy

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Dec 12 '22 00:12 github-actions[bot]