beanquery icon indicating copy to clipboard operation
beanquery copied to clipboard

Query result of a column as a percentage

Open jenicek opened this issue 3 years ago • 7 comments
trafficstars

When performing queries, it is often useful to express one of the columns as a percentage of the total. An example I see commonly is reporting on Assets allocation - grouping Assets by some key and reporting the percentage of cost (value) per group. If I did not miss anything, this cannot be currently done with the query language alone. Would you find this useful to add to the query language? In fava, this would allow to replace many plugins with a single query and I would find it useful for command-line usage too.

If interested, what do you think would be the best way to add this? I can think of three possibilities

  • allowing simple subqueries (select position / (select sum(position))) - this would be versatile (would allow for a different normalization, such as division by max) and consistent with SQL
  • allowing reusing query results in subsequent queries (e.g. by naming them) - also versatile, maybe useful also outside of this specific use-case
  • introducing a new construct to the language (e.g. a virtual column total, like balance) - this looks quite hacky, but would solve this particular issue and seems the easiest to implement

jenicek avatar Feb 01 '22 19:02 jenicek

Development of bean-query has moved to the beanquery project in the beancount group. I think the solution that makes more sense is the support for subqueries, however it is not trivial to implement.

dnicolodi avatar Feb 01 '22 20:02 dnicolodi

Should I open the issue there instead? Do you think this would be useful and worth pursuing? I was looking into how query parsing is done in beancount, but I have no background in grammars, except for a math course at the university some time ago.

jenicek avatar Feb 02 '22 12:02 jenicek

The bean-query code should me removed from this repo and all issues moved to the new beanquery project. This issue can move with the others. No need to close it and open a new one. The parsing of the query is probably the easiest problem to solve. The hardest part would be the changes required to the query execution.

dnicolodi avatar Feb 03 '22 20:02 dnicolodi

Thanks for the info. I may have a very simplistic point of view, but the evaluation seems to be a couple of lines to me. If the subquery would be represented by e.g. query_compile.EvalFunction, then it seems that it would be enough to filter all subqueries at https://github.com/beancount/beancount/blob/master/beancount/query/query_execute.py#L266 , recursively call execute_query on the stored subquery and replace it with the resulting value. Am I making a mistake somewhere?

jenicek avatar Feb 04 '22 15:02 jenicek

That would cover just the most basic use of subqueries, namely subqueries in scalar context https://www.postgresql.org/docs/14/sql-expressions.html#SQL-SYNTAX-SCALAR-SUBQUERIES but not really as it would not support using variables from the surrounding query, IMHO an essential feature for any use that is not very simple. What I was thinking about supporting subquery expressions https://www.postgresql.org/docs/current/functions-subquery.html#id-1.5.8.29.15 and subquery table expressions https://www.postgresql.org/docs/14/queries-table-expressions.html#QUERIES-SUBQUERIES

dnicolodi avatar Feb 18 '22 15:02 dnicolodi

You are right, this would open a lot of new possibilities, and I think it could be very useful. I will try to look into different types of subqueries and how they should be combined with the original query and see if I can find a way to integrate it with the beancount codebase. Thanks for the background.

jenicek avatar Feb 20 '22 15:02 jenicek

@blais Transfer to beanquery?

dnicolodi avatar Apr 10 '22 14:04 dnicolodi