neo4j-spark-connector
neo4j-spark-connector copied to clipboard
Pass parameters to the connector
Guidelines
Please note that GitHub issues are only meant for bug reports/feature requests. If you have questions on how to use the Neo4j Connector for Apache Spark, please ask on the Neo4j Discussion Forum instead of creating an issue here.
Feature description (Mandatory)
Add option to pass params to Neo.
I'm working on a Kedro integration for Neo4J, and this connector seems to be perfect. However, I want to define my queries in a Pythonic way, using Pypher. This works fairly well, e.g.,
from kedro_datasets.spark import SparkDataset
from kedro_datasets.spark.spark_dataset import _get_spark
class Neo4JDataset(SparkDataset):
...
def _load(self) -> Any:
"""Load Neo4J table as SparkDataset"""
spark_session = _get_spark()
return (
spark_session.read.format("org.neo4j.spark.DataSource")
.option("database", self._database)
.option("url", self._url)
.options(**self._credentials)
.option("query", str(self._load_query(self._query)))
)
# with self._query an arbitrary Pypher object, e.g.,
# query = (
#. pypher.MATCH.node("person", labels="Person")
# .rel_out(labels="LIKES")
# .node("movie", "Movie")
# .RETURN(__.person.__id__.ALIAS("p"))
# )
However, binding variables that come from the Kedro context are not possible, due to the inability to specify params. Pypher already has the bound_params
attributes that yields a nicely formatted dictionary.
The inability to specify params is rather akward here, especially since predicate pushdown is disabled for the query
option.
Considered alternatives
N/A
How this feature can improve the project?
Better adoption