sedona
sedona copied to clipboard
Geostats Functions in Spark Connect
I don't think the stats functions are compatible with spark connect today. I tried this in spark 3.5:
(python) ➜ python git:(graphframes-0.9.0) ✗ export SPARK_REMOTE=local
(python) ➜ python git:(graphframes-0.9.0) ✗ pytest -v tests/stats
and every test that wasn't skipped (for checkpointing) gave this kind of _jvm error:
self = <pyspark.sql.connect.session.SparkSession object at 0x16fd17df0>, name = '_jvm'
def __getattr__(self, name: str) -> Any:
if name in ["_jsc", "_jconf", "_jvm", "_jsparkSession"]:
> raise PySparkAttributeError(
error_class="JVM_ATTRIBUTE_NOT_SUPPORTED", message_parameters={"attr_name": name}
E pyspark.errors.exceptions.base.PySparkAttributeError: [JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `_jvm` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session.
../../../../.local/share/virtualenvs/python-GYLC1Bm8/lib/python3.10/site-packages/pyspark/sql/connect/session.py:692: PySparkAttributeError