spack icon indicating copy to clipboard operation
spack copied to clipboard

Update py-pyspark and py-py4j

Open teaguesterling opened this issue 9 months ago • 2 comments

  • Update versions for py-spark
  • Add a default variant of py4j that enforces a requirement of Java (disable to use system java). When this isn't present py4j will fail to initialize if Java is not present (or is incompatible with spark)
  • Add a variant and dependencies to pyspark to require java via py4j
  • Make the explicit py4j version and py-spark version tracking easier to read/update.
  • Adds dependencies as noted on pyspark dependencies page: pyarrow, pandas, numpy, grpcio, grpcio-status, and googleapis-common-proto
  • Adds new versions of pyarrow and arrow (for gcc 14 compatibility)
  • Adds version bumps to 3 packages to meet expectations from spark: grpcio, grpcio-status, and googleapis-common-proto
  • Added new protobuf versions (for grpcio-status dependency) and gcc 14 compatibility conflicts

Not included (but probably should be): allow py-spark to be built from source (or provided as a virtual package from spark built from source).

I'm not sure about the best way to set defaults for py4j & java.

teaguesterling avatar May 18 '24 23:05 teaguesterling