Error when using window function
Describe the bug
I'm trying to write a query using a window function, but receive an error Aggregate functions are not allowed in WHERE.
aggregate functions are not allowed in WHERE
🐜 This seems to be an execution error, which means that your request syntax seems okay,
but the resulting statement cannot be executed → Probably not a pg-mem error.
*️⃣ Reconsituted failed SQL statement: SELECT (sum ((count (*) )) over ( ORDER BY user_created_at ) ) AS count , user_created_at AS year FROM users WHERE (user_created_at >= ('2020-01-01 00:00:00.000+00')) GROUP BY user_created_at ORDER BY year
This seems to work fine outside of pg-mem.
For context, I'm want to receive a cumulative total of users created, grouped by their creation date. I've simplified the query greatly for the sake of this bug, but ultimately want to end up with something like:
| year | total |
|---|---|
| 2020 | 10 |
| 2021 | 20 |
| 2022 | 30 |
Where there were 10 users added each year. In my actual query I'm using date_part to get the year, but this isn't supported out of the box with pg-mem so I've left it off the example. I was able to add that function in using registerFunction but wanted to have something that you could reproduce in the playground, and for the sake of this bug, date_part isn't relevant.
To Reproduce
create table users ( "user_id" text, "user_created_at" timestamptz default current_timestamp );
insert into users (user_id) values ('first'), ('second'), ('third');
SELECT
SUM(COUNT(*)) OVER(ORDER BY user_created_at) as count,
user_created_at
FROM users
WHERE user_created_at >= '2020-01-01 00:00:00.000+00'
GROUP BY user_created_at
ORDER BY user_created_at
pg-mem version
2.3.3
Hi !
Thanks for the report (and sorry for the late reply).
"OVER" keyword is not supported yet in pg-mem. It will come, but I cant give an ETA yet :/
That said, correct me if I'm wrong, but it seems that your statement above doesnt need an "OVER" clause, it seems equivalent to:
SELECT
COUNT(*) as count,
user_created_at
FROM test
WHERE user_created_at >= '2020-01-01 00:00:00.000+00'
GROUP BY user_created_at
ORDER BY user_created_at
Or do you have simplified a more complex request ?
Anyway, I've improved the corresponding error message in the release to come so others wont be confused :)
No worries, thanks for the update all the same. For the time being I've just stubbed out the query in my tests.
I have simplified a more complex query but even with the simplified example the difference is subtle but distinct. Using OVER in the way above will allow me to get the cumulative total. E.g. given the following table ( simplified for brevity ):
| user_id | user_created_at |
|---|---|
| id-1 | 2020-01-01 |
| id-2 | 2020-01-01 |
| id-3 | 2020-01-02 |
| id-4 | 2020-02-01 |
Without OVER, I would get the following:
| count | user_created_at |
|---|---|
| 2020-01-01 | 2 |
| 2020-01-02 | 1 |
| 2020-02-01 | 1 |
Which is the count of users created grouped by each day. E.g. 2 signed up on the 1st of Jan, 1 on the 2nd of Jan, etc
However, using count I'll get the cumulative running total:
| count | user_created_at |
|---|---|
| 2020-01-01 | 2 |
| 2020-01-02 | 3 |
| 2020-02-01 | 4 |
Which is helpful to see how the total user base has grown over time. E.g. on the 1st of Jan we had 2 total users, on the 2nd of Jan we had 3 total users, etc.
Of course I could easily do the former and sum it up myself with a function afterwards, but it's nice and easy to just query it straight from the DB.
thanks for the details
I was looking at using pg-mem for tests and while looking into another issue found this one. Unfortunately OVER is the way we handle pagination for most of our API queries, so missing this means we can't use this great project :(. Just commenting for more context that this is a pretty common pagination + total pages technique, I'll keep an eye on this issue for whenever OVER does make its way in!
@oguimbal Hello, its been a while that this issue was not seen. Do you have more of an ETA regarding this feature or is it still not yet priorized ? :)