sea-orm icon indicating copy to clipboard operation
sea-orm copied to clipboard

Aggregate APIs

Open tyt2y3 opened this issue 3 years ago • 33 comments

Add a new trait AggregatorTrait to be impl by Selector* count() and sum avg max min by specifying a column to aggregate on

tyt2y3 avatar Jul 15 '21 09:07 tyt2y3

56763a8718550b7a5d08b15489697f366dcd1bf5

tyt2y3 avatar Jul 17 '21 14:07 tyt2y3

So, it allows user to aggregate by column(s) / expression on any of the supported aggregation method (i.e. sum, avg, max, min)?

billy1624 avatar Aug 13 '21 10:08 billy1624

My idea was to have a struct called Aggregator that will perform the action, as in Paginator. The sum method on the AggregatorTrait will convert self into an Aggregator and execute in async. The parameter to sum should be a ColumnTrait

tyt2y3 avatar Aug 13 '21 15:08 tyt2y3

How about "Select A, SUM(B) AS B FROM Table GROUP BY A"?

billy1624 avatar Aug 13 '21 15:08 billy1624

I think it should work similar to Paginator, creating a sub query is the safest

tyt2y3 avatar Aug 13 '21 15:08 tyt2y3

So the result is just a simple type T, perhaps a number in i32, i64, f32, f64, etc.

billy1624 avatar Aug 13 '21 15:08 billy1624

So the result is just a simple type T, perhaps a number i32, i64, f32, f64, etc.

Oh, not yet thought of this. No, it can't be type T. We need to query the column type in runtime and select the appropriate Value variant

tyt2y3 avatar Aug 13 '21 15:08 tyt2y3

Btw... we should consider changing the return type of Paginator::count(), make it more dynamic to account for db differences.

e.g. Postgres prefer i64; MySQL is okay with both i32 & i64

billy1624 avatar Aug 13 '21 15:08 billy1624

Btw... we should consider changing the return type of Paginator::count(), make it more dynamic.

I think the result of count is not dependent on the column type, only dependent on the database driver. Using i64 is the safest bet.

tyt2y3 avatar Aug 13 '21 15:08 tyt2y3

Yes!

billy1624 avatar Aug 13 '21 15:08 billy1624

So all these methods (count, sum, avg, max, min) return single value as result?

So the result is just a simple type T, perhaps a number i32, i64, f32, f64, etc.

Oh, not yet thought of this. No, it can't be type T. We need to query the column type in runtime and select the appropriate Value variant

What you mean by selecting an appropriate Value variant at runtime?

billy1624 avatar Aug 14 '21 09:08 billy1624

Well, if the column itself is a Double, then the avg result would be f64. It could be decimal, i32 and so on. It is different from count that always return an i64.

tyt2y3 avatar Aug 14 '21 15:08 tyt2y3

Yes, think so!

Generic types loll

billy1624 avatar Aug 14 '21 15:08 billy1624

To whom it may concern, if it is important to you, make a comment.

tyt2y3 avatar Sep 24 '21 10:09 tyt2y3

I'm moving from diesel to this crate, and love it! Although I'm using count https://docs.diesel.rs/1.4.x/diesel/dsl/fn.count.html and would like it if it was implemented here too. I can help implementing it, if someone could help me/indicate what to do :)

BenJeau avatar Jan 17 '22 17:01 BenJeau

Hey @BenJeau, thanks for the help!

I have a rough idea in mind on how this could be done. It would be similar to PaginatorTrait.

https://github.com/SeaQL/sea-orm/blob/a517e9db1aa8447300c25c3cf7fcf06724d0b698/src/executor/paginator.rs#L183-L202

Let say we create an AggregatorTrait then implement it for...

  • Selector<S>
  • Select<E>
  • SelectTwo<E, F>
#[async_trait::async_trait]
pub trait AggregatorTrait
{
    fn aggregate(self, db: &'db C) -> Aggregator;
}

pub struct Aggregator<'db, C>
where
    C: ConnectionTrait,
{
    pub(crate) query: SelectStatement,
    pub(crate) db: &'db C,
}

impl<'db, C> Aggregator<'db, C> {
    pub fn count<T>(col: T) -> Result<i64, DbErr>
    where
        T: ColumnTrait
    {
        todo!()
    }
}

So, the public API would be

let count: i64 = cake::Entity::find()
    .aggregate(db)
    .count(cake::Column::Id)
    .await?;

And we could use subquery to select the aggregated result, it would be easier to for us to implement.

  • https://docs.rs/sea-query/0.20.0/sea_query/query/struct.SelectStatement.html#method.from_subquery
SELECT COUNT(sub.id) FROM (SELECT cake.id, cake.name FROM cake) AS sub

billy1624 avatar Jan 18 '22 03:01 billy1624

Is there still interest in this feature being implemented?

kirawi avatar Apr 23 '22 00:04 kirawi

I think so, it's marked as a good first issue anyway

tyt2y3 avatar Apr 24 '22 10:04 tyt2y3

I'll give this issue a shot in the coming days.

kirawi avatar May 16 '22 20:05 kirawi

Is this issue still active @billy1624 ?

LemarAb avatar Dec 25 '22 16:12 LemarAb

If nobody minds, I would like to claim this task :) .

LemarAb avatar Dec 26 '22 14:12 LemarAb

@billy1624 Following your suggestion, it seems that you would like the count function to actually return the number of rows. Do I assume corrrectly that sea-orm currently does not provide a way to access counts for tables?

LemarAb avatar Dec 28 '22 18:12 LemarAb

Hey @LemarAb, we do have methods to construct aggregate expression in SeaORM. https://docs.rs/sea-orm/latest/sea_orm/entity/prelude/trait.ColumnTrait.html#method.count

billy1624 avatar Jan 03 '23 09:01 billy1624

@billy1624 so there is no intention to make the APIs compatible with further querying, as we return i64 immediately, only (one) aggregation should be performed per aggregate() call, correct? So it's just a way to make aggregation easier?

Can you elaborate on your previous comment:

And we could use subquery to select the aggregated result, it would be easier to for us to implement.

  • https://docs.rs/sea-query/0.20.0/sea_query/query/struct.SelectStatement.html#method.from_subquery
SELECT COUNT(sub.id) FROM (SELECT cake.id, cake.name FROM cake) AS sub

Why is there a necessity for a subquery ?

LemarAb avatar Jan 06 '23 10:01 LemarAb

Hey @LemarAb, I saw it the other way round. User can perform filtering or even pre-aggregating, then aggregate it to produce the final result.

SELECT COUNT(sub.id)
FROM (SELECT cake.id, cake.name FROM cake WHERE ... GROUP BY ... HAVING ...) AS sub

Ideally we wrap the original select statement in a subquery because this way we always perform aggregation on a query result of a black box.

I imagine the API be like:

let count: i64 = cake::Entity::find()
    .count(cake::Column::Id) // method defined in `AggregatorTrait` to construct the `Aggregator` struct
    .one(db) // method defined in `Aggregator` struct
    .await?;

billy1624 avatar Jan 11 '23 04:01 billy1624

Okay @billy1624 , we are on the same page when it comes to pre-filtering, but I am concerned with what comes after the count()call.

Your proposal further above intended an i64 to return as a result of the count method...what methods should be callable after that? As far as I am aware, we can implement the call to the db connection in the count method, just like we already do in fn paginate. So why the need to use theone() method?

LemarAb avatar Jan 11 '23 14:01 LemarAb

The one() method will run the query and retrieve the result as you might have guessed. I want it to be a two steps process and leave room for API addition in the future.

let count: i64 = cake::Entity::find()
    .count(cake::Column::Id) // 1) Convert into `Aggregator`
    .some_more_fn(...) // 1.x) Leave room for future API addition
    .one(db) // 2) Run the query
    .await?;

Does that make sense?

billy1624 avatar Jan 12 '23 10:01 billy1624

Why not replace some_more_fn with and_then.

Ref :and_then

baoyachi avatar Jan 12 '23 10:01 baoyachi

@billy1624 so opposite to what you suggested further above, count should not return the number right away, but return Aggregator to leave room for further aggregation APIs? Are there any functions supposed to be callable right now after count aside from one()?

LemarAb avatar Jan 14 '23 11:01 LemarAb

For example we can implement QuerySelect for Aggregator.

billy1624 avatar Jan 14 '23 12:01 billy1624