sparglim icon indicating copy to clipboard operation
sparglim copied to clipboard

Spark Connect version 3.5.4

Open RedwanAlkurdi opened this issue 11 months ago • 1 comments

Problem

I'm always frustrated when I need to do package management because I have to add the packages/jars to the pod as an init container. I would like it if we can update the sc server to version 3.5.4, because they've added client level package management. -> https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.addArtifact.html

Proposed Solution

I would've done it myself. However, the amount of dependencies that exist are too high for me to be able to figure out which packages needs to be added. Therefore, it would be great, if you had time to do it. Ofc, if you don't, I totally understand.

Additional context

RedwanAlkurdi avatar Feb 03 '25 16:02 RedwanAlkurdi

Thank you for your interest in this project!

I’ve been away from Spark-related projects for quite some time, which is why this project hasn’t been updated. Currently, I’m focusing on LLM-related work, so unfortunately, I may not be able to provide much help.

That said, sparglim does not impose any restrictions on the PySpark version. You can install any version of the Spark components on any base image. You might find my base image Dockerfile useful as a reference—simply include the packages you need: https://github.com/Wh1isper/spark-build

Hope this helps!

Wh1isper avatar Feb 03 '25 17:02 Wh1isper