spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-49854][SQL] Clone artifact manager during session clone

Open xupefei opened this issue 1 year ago • 2 comments

What changes were proposed in this pull request?

This PR implements a proper clone mechanism for ArtifactManager during a Spark Session cloning. The cloned manager will have a fresh copy of all the parent's resources.

During a clone, cached relations, classes, JARs, and Python artifacts are copied to the new instance.

Why are the changes needed?

Before this PR cloning a Spark session won't take parents' artefacts to the cloned session.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New test.

Was this patch authored or co-authored using generative AI tooling?

No.

xupefei avatar Oct 02 '24 08:10 xupefei

@vicennial This PR changed after your stamp. Could you re-review the change again? @hvanhovell Please take a look :D

xupefei avatar Oct 18 '24 08:10 xupefei

Good to have this to unblock streaming use case!

haiyangsun-db avatar Oct 21 '24 22:10 haiyangsun-db

LGTM - will merge after CI

hvanhovell avatar Oct 23 '24 14:10 hvanhovell

Seems unrelated. Triggering a re-run.

  File "/__w/spark/spark/python/pyspark/install.py", line 166, in install_spark
    raise OSError("Unable to download %s." % pretty_pkg_name)
OSError: Unable to download spark-3.0.1 for Hadoop hadoop3.2.

xupefei avatar Oct 23 '24 16:10 xupefei

Merging to master. Thanks!

hvanhovell avatar Oct 24 '24 01:10 hvanhovell