spark
spark copied to clipboard
[SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer
What changes were proposed in this pull request?
Fixed memory leak caused by not clearing the FileStatusCache created by calling FileStatusCache.getOrCreate when close a SparkSession of Spark Thrift Server.
Why are the changes needed?
When create many sessions in Spark Thrift Server and do querys in each session, the fileStatus of files related will be cached. But these filestatus will not be cleared when these sessions are closed, the memory leak causes Driver OOM.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Tested by the steps in SPARK-39866.
jenkins, test this please
jenkins, test this please
no jenkins now :)
Different sessions can share this cache. Maybe you can set
spark.sql.metadataCacheTTLSeconds
to a positive value to workaround this issue?
@wangyum IIUC different sessions' file status cache is separated, but they share memory, so clean file status when closing session asap is also necessary: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala#L157
Besides, TTL seconds is hard to set, as a small value will hurt performance while big value will cause the same memory issue.
jenkins, test this please
no jenkins now :)
@LuciferYang thanks for reminding :) @Resol1992 maybe you need to check your github action settings?
This is a very good fix when using multiple connections to query the same table and then closing the connections, I don't think relying on TTL is enough.
This is a very good fix when using multiple connections to query the same table and then closing the connections, I don't think relying on TTL is enough.
Different sessions can share this cache. Maybe you can set
spark.sql.metadataCacheTTLSeconds
to a positive value to workaround this issue?
@wangyum Thanks for your advice. I have tried to workaroud this with setting spark.sql.metadataCacheTTLSeconds = 10
, but it does not work, the fileStatus objects still exist after sparkSession is closed.
cc @yaooqinn @cloud-fan , could you also take a look?
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!