pony icon indicating copy to clipboard operation
pony copied to clipboard

Implement Custom SQL Functions (incuding Aggregates and Collation Funcs.)

Open SpaceshipOperations opened this issue 5 years ago • 1 comments

Hi, I find Pony ORM is missing one very important feature, which is the ability to register custom SQL functions. As you know, the sqlite3 DBAPI exposes Connection methods like create_function(), create_aggregate(), create_collation(), which allow you to use user-defined functions within SQL.

While it's possible to register custom functions on the SQLite connection and then use them via raw_sql (See first comment for examples), it would be nice if Pony offered the ability for them to be registered in Pony itself and then used in its usual lambda / generator expression syntax, just like any other function offered by Pony itself (such as max).

For example:

def regex_match(pattern: str, string: str) -> bool:
    return bool(re.match(pattern, string))

db.register_function('regex_match', regex_match, 2)  # 2 is the arity
# or whatever are the arguments Pony would need

with db_session:
    pattern = '^abc-[0-9]+$'
    query = Track.select(lambda t: regex_match(pattern, t.title))
    # or:
    query = select(t for t in Track if regex_match(pattern, t.title))
    for track in query:
        print(track.title)

And this would result in the following SQL:

SELECT t.id, t.title, ...
FROM track as t
WHERE regex_match(?, t.title);
["^abc-[0-9]+$"]

And where the database engine would invoke the Python function associated with regex_match which was registered by PonyORM when db.register_function() was called.

For collation sequences, the registration would be similar, but using them would probably be via the sort_by() method of query objects. For example:

def natural_sort(a: str, b: str) -> int:
    """
    Sort strings containing numbers in natural order.
    For example: "abc-1" < "abc-2" < "abc-11",
    As opposed to: "abc-1" < "abc-11" < "abc-2"
    """
    if a == b:
        return 0
    if ...:
        return +1
    if ...:
        return -1

db.register_collation('natural_sort', natural_sort)

with db_session:
     query = Track.select().sort_by(lambda t: t.title, collate=natural_sort)
     for track in query:
         print(track.title)

Which would result in:

SELECT t.id, t.title, ...
FROM track as t
ORDER BY t.title COLLATE natural_sort;

Any response would be appreciated. And thank you for writing this wonderful library!

SpaceshipOperations avatar Nov 27 '20 09:11 SpaceshipOperations

It should be noted that using custom functions does have a workaround, by registering them on the database connection and then invoking them with raw_sql:

def regexp(pattern: str, string: str) -> bool:
    return re.search(pattern, string, re.I) is not None

@db.on_connect(provider='sqlite')
def on_connect(db, connection):
    connection.create_function('regexp', 2, regexp)

with db_session:
    pattern = '^abc-[0-9]+$'
    query = Track.select(lambda t: raw_sql("regexp($(pattern), t.title)"))
    # or, as a special case for the function named "regexp",
    # the REGEXP operator syntax can be used:
    query = Track.select(lambda t: raw_sql("t.title REGEXP $(pattern)"))
    for track in query:
        print(track.title)

Which translates into:

-- first case:
SELECT t.id, t.title, ...
FROM track as t
WHERE regexp(?, t.title);
["^abc-[0-9]+$"]

-- second case:
SELECT t.id, t.title, ...
FROM track as t
WHERE t.title REGEXP ?;
["^abc-[0-9]+$"]

Likewise, collation sequences can be registered, then we can use the order_by() method and give it a raw SQL argument, inside which we can sneak the COLLATE clause:

@db.on_connect(provider='sqlite')
def on_connect(db, connection):
    connection.create_collation("natural_sort", natural_sort)

with db_session:
     query = Track.select(lambda t: True) \
         .order_by(raw_sql("t.title COLLATE natural_sort"))
     for track in query:
         print(track.title) 

Which would result in:

SELECT t.id, t.title, ...
FROM track as t
ORDER BY t.title COLLATE natural_sort;

It does work, but there are some caveats. First, the column name has to be hard-coded into the raw SQL string because SQLite does not support placeholders (like ?) for column names. This means that if the column name will be filled in from a variable (possibly obtained from user input), the programmer must obviously sanitize it/ensure that it actually refers to an existing column/attribute before adding it to the string.

If Pony added support for user-defined functions and collations as described in the issue's OP, then stuff like lambda t: regexp(pattern, getattr(t, field)) and query.sort_by(getattr(Track, field), collate=natural_sort if field in NATURALLY_SORTED_FIELDS else None) becomes possible, with Pony taking care of validating and filling in column names as usual, making life easier.

Also, while at it, for the regexp operator as a special case, it would be nice if Pony's entity attribute objects offered a regexp method that can be used like lambda t: t.title.regexp(pattern) and results in SQL t.title REGEXP arg. Same for the LIKE operator, i.e. t.title.like(pattern) -> t.title LIKE pattern, as Pony does not have that, either. (Come to think of it, I probably should open a separate issue for those.) While Pony itself does not have to take care of implementing the actual regexp function (and it probably shouldn't, as there are various implementations for it, including native C ones, and the programmer may prefer to load a C extension instead of registering a Python implementation), the presence of a regexp attribute gives syntactic convenience for using this operator, which I believe is common enough to care about. But of course, the whole regexp thing is just a special case. The general case of using any user-defined function should be supported too (and is arguably even more important).

SpaceshipOperations avatar Nov 27 '20 14:11 SpaceshipOperations