grand-cypher icon indicating copy to clipboard operation
grand-cypher copied to clipboard

Feature request: grammar / `WHERE` namespace extensions

Open ntjess opened this issue 11 months ago • 8 comments

Use case: Cypher supports a robust library for datetime analysis, among other extensions. While it is unrealistic to expect their integration here, it would be nice to allow python equivalents during queries. I envision something like this:

import pandas as pd
GrandCypher(g, namespace={"datetime": pd.to_datetime, ...}).run("""
MATCH (a) --> (b)
WHERE a.date < datetime("2024-01-01")
RETURN a.name
""")

Would love to hear your thoughts.

ntjess avatar Dec 19 '24 20:12 ntjess

I was able to achieve something similar with this grammar modification:

%import python.expr_stmt -> python_expr

// Replaces current `where_clause` definition
where_clause: "where"i python_expr

And this implementation:

from lark.reconstruct import Reconstructor

# Add global items here such as "to_datetime": pd.to_datetime
WHERE_EXPRESSION_GLOBALS = {}
...


class _CypherNamespace(dict):
    """
    dot.notation access to dictionary attributes, useful for enabling cypher filtering
    syntax like `m.born < to_datetime("1990")`
    """

    def __getattr__(self, attr):
        out = self.get(attr)
        if isinstance(out, dict):
            return type(self)(out)
        return out

    __setattr__ = dict.__setitem__
    __delattr__ = dict.__delitem__

...

_CypherGrammar = Lark(..., maybe_placeholders=False)
reconstructor = Reconstructor(_CypherGrammar)

... 

# Update these CypherTransformer methods
def where_clause(self, where_clause: list[Tree]):
    self.where_string = reconstructor.reconstruct(where_clause[0])

def _new_where_condition(cname_value_map: dict, target_graph: nx.DiGraph, _):
    if not self.where_string:
        return True, []
    eval_locals = {
        cname: _CypherNamespace(target_graph.nodes[value])
        for cname, value in cname_value_map.items()
    }
    result = eval(self.where_string, WHERE_EXPRESSION_GLOBALS, eval_locals)
    return result, [result]

Currently, it assumes a hard-coded list of globals that the user can update with their own values.

It was a fun intro to lark 🙂

ntjess avatar Dec 24 '24 21:12 ntjess

@ntjess just wanted to pop in and tell you this looks so awesome — I need to take a closer look here (and in #59) but I'm unfortunately in the midst of my phd dissertation prooposal process and it's using up all my cycles for the next wee or so. But I love what you did here! I'm brainstorming about how we can integrate into the official codebase while keeping back-compat and vuln surface-area low!!

j6k4m8 avatar Dec 31 '24 16:12 j6k4m8

back-compat

One option is to add python_expr at the end instead of replacing the current where matches. In cases that match python expressions, the same behavior will result. In cases (like contains) that are not valid python, Lark should resolve in favor of legacy behaivor.

vuln surface-area low

I've discovered pd.eval which can help with this. It limits python "control codes" like dot-access and module imports, but of course there will always be vulnerabilities associated with eval'ing python code...

ntjess avatar Jan 02 '25 18:01 ntjess

I wonder if a pattern similar to what the python sqlite3 module does is one to consider. Check out this create_function documentation.

I'd be concerned from a security standpoint to introduce evals into the query syntax. Seems like a lot could go wrong there.

davidmezzetti avatar Feb 14 '25 17:02 davidmezzetti

I'll also just drop here that if we support create_function / namespace style additions, we should totally make sure that there's a clear error message that says disambiguates things like

  • this is not recognized syntax
  • this is a stdlib function but we haven't implemented it yet
  • this is not a recognized function (maybe you forgot a namespace/create_function?)

j6k4m8 avatar Feb 15 '25 19:02 j6k4m8

I prefer that we have better controlls over the functions than eval any python_expr.

If I understand correctly, we are going to have custom namespace functions, which can be used in where clause, especially on the "value" side. I think the parser itself can support namespace_functions setting in the where clause.

khoale88 avatar Oct 29 '25 16:10 khoale88

Agreed! Possibly the best way to do this is at the hints level; for more fine-grained Python interweaving into queries it might make sense to work with grandiso directly!

j6k4m8 avatar Oct 29 '25 19:10 j6k4m8

hi @j6k4m8 , I'm not sure how to link this issue with a MR. I'm going to paste the MR here.

Basically, I support a scope_functions setting in the GrandCypher initialization.

def test_nested_functions(self, graph_type):
        host = graph_type()
        host.add_node(1, d=date(2025, 10, 31))
        host.add_node(2, d=date(2025, 11, 30))
        host.add_node(3, d=date(2025, 12, 31))

        qry = """
        MATCH (A)
        WHERE A.d < date(int("2025"), add(5, 7), 1)
        RETURN ID(A)
        """

        res = GrandCypher(host, scope_functions={"date": date, "int": int, "add": lambda a, b: a + b}).run(qry)
        assert res["ID(A)"] == [1, 2]

https://github.com/aplbrain/grand-cypher/pull/80

khoale88 avatar Oct 31 '25 13:10 khoale88