redis-om-python icon indicating copy to clipboard operation
redis-om-python copied to clipboard

Needs to support `FT.AGGREGATE` / RediSearch aggregations.

Open simonprickett opened this issue 2 years ago • 7 comments

This client needs to support RediSearch aggregations / FT.AGGREGATE.

simonprickett avatar May 20 '22 16:05 simonprickett

I would like to work on this issue, however, i have a question related to the naming of the method, wanna call it aggregate? ~~Wanna support aggregations on both rows and columns or just columns?~~ Something like:

model.aggregate({model.field1: ['max', 'min'], ...})

wiseaidev avatar Aug 08 '22 12:08 wiseaidev

@wiseaidev It would be really awesome if you could contribute it!

The name of the methods should be aggregate. I also thought to implement it and in my opinion the API should be something like this:

model.find(query).aggregate(max=model.FieldName, min=model.FieldName2)

But if you think otherwise, suggest your idea and let's discuss it.

I'm not sure I understood your question about the rows / columns but I think we only want columns.

dvora-h avatar Aug 10 '22 11:08 dvora-h

It's a different programming paradigm, but OM .NET has aggregations, might be worth a look at how Steve did that https://github.com/redis/redis-om-dotnet#-aggregations

simonprickett avatar Aug 10 '22 12:08 simonprickett

Hey @dvora-h. As you may know, the thing is that the aggregate function is a whole new query and not related/ doesn't require a find query. You can refer to this example on how the query looks like. So, i think the API should look like the following:

model.aggregate(model.field1= aggregate_function)

However, it becomes challenging when you want to apply multiple aggregate functions on the same field. So, in an ideal world, the function should look like, as i described above:

model.aggregate({model.field1: ['aggregate_function1', 'aggregate_function2', ..., 'aggregate_functionN'], ...})

In RedisSearch, the query would look like this:

FT.AGGREGATE GivenIndex "*" GROUPBY 1 @field1 REDUCE aggregate_function1 1 field1 AS value1 REDUCE aggregate_function2 1 field1 AS value2 ...

I'm not sure I understood your question about the rows/columns but I think we only want columns.

I meant this example. You can either pass 0 to compute the aggregation function on all columns, or 1 for a specific property/column.

Edit:

My initial thought was to try the DRY principle. So I went into the code, tried to make use of the FindQuery class, and add an additional boolean variable that indicates the query is either a find or aggregate. And in the execute function, you can do something like;

    async def execute(self, exhaust_results=True):
        if self.aggregate:
            args = ["ft.aggregate", self.model.Meta.index_name, self.query, *self.pagination]
        else:
            args = ["ft.search", self.model.Meta.index_name, self.query, *self.pagination]

However, the problem now is related to the query syntax:

{model.field1: ['aggregate_function1', 'aggregate_function2', ..., 'aggregate_functionN'}

Which requires a new ExpressionProxy. So i tried to implement the following:

@dataclasses.dataclass
class AggregationExpression:
    left: ModelField
    right: List[RediSearchAggregationReducers]

    @property
    def name(self):
        return str(self.left)

    def __hash__(self) -> int:
        return hash(self.left)

and the supported methods in RedisSearch are:

class RediSearchAggregationReducers(Enum):
    COUNT = "COUNT"
    COUNT_DISTINCT = "COUNT_DISTINCT"
    COUNT_DISTINCTISH = "COUNT_DISTINCTISH"
    SUM = "SUM"
    MIN = "MIN"
    MAX = "MAX"
    AVG = "AVG"
    STDDEV = "STDDEV"
    QUANTILE = "QUANTILE"
    TOLIST = "TOLIST"
    FIRST_VALUE = "FIRST_VALUE"
    RANDOM_SAMPLE = "RANDOM_SAMPLE"

    def __str__(self):
        return str(self.name)

What do you think?

Edit1:

Most Likely we should implement a whole new class called AggregateQuery.

Edit2:

This query is relatively easy to implement:

model.aggregate(model.FieldName1=max, model.FieldName1=min,..)

But, it is not ideal, you have to pass the same field multiple times if you want to apply multiple aggregation functions on the same field. The following one is the holy grail:

model.aggregate({model.field1: ['aggregate_function1', 'aggregate_function2', ..., 'aggregate_functionN'], ...})

I will try to implement the easy one.

wiseaidev avatar Aug 10 '22 12:08 wiseaidev

Big fan of the second approach, and ensuring that we can do chained aggregations. This would both feel very much in spirit with this library, and align really well to other Python ORMs (I'm looking at Django!).

It feels "strange" that we're not aggregating a query, if only because traditionally those things march in tandem.

One syntax I'm a fan of is:

model.agggregate(Avg=model.Fieldname, Sum=model.FieldName)

This makes chaining a little cleaner. Through approach this may eventually lead to class based aggregations. My biggest concern is that it feel "pythonic". The downside from approaches like this are reserved names, leading to Capitalized variables.

@wiseaidev ?

chayim avatar Aug 11 '22 10:08 chayim

ensuring that we can do chained aggregations.

I recently started to realize that. That's right!

The downside from approaches like this are reserved names, leading to Capitalized variables.

How about passing the aggregate functions as strings:

model.agggregate(model.Fieldname='sum', model.FieldName='min')

The user can enter whatever they like, however in the code, you can always do .lower thing.

Edit:

I think it requires a new proxy expression, something like:

@dataclasses.dataclass
class AggregationExpression:
    left: Optional[ExpressionOrModelField]
    right: RediSearchAggregationReducers
    parents: List[Tuple[str, "RedisModel"]]

Similar functionality to FindQuery, but here, we have = instead of ==.

wiseaidev avatar Aug 11 '22 15:08 wiseaidev

@wiseaidev Any chance this is coming along soon?

XChikuX avatar May 16 '23 06:05 XChikuX