typedb
typedb copied to clipboard
[TypeDB 3.x] Grouped aggregates and non-terminal reductions
Problem to Solve
Grouped aggregations are a core data analysis command
In its current specification, TypeDB "almost" supports this features, but it could be made much more convenient.
Current Workaround
For example, consider counting posts from a specific country by tag. Using #7038 we can write:
with fun count_posts($tag: tag, $country: country) -> long:
match
$post isa post;
location_check($post, $country) == true;
(post: $post, tag: $tag) isa tagging;
return count(post);
match
$country isa country, has name "Australia";
$post isa post;
location_check($post, $country) == true;
(post: $post, tag: $tag) isa tagging;
select $tag, $country;
distinct;
$count = count_posts($tag, $country)
fetch:
"tag": $tag.text;
"post count": $count;
But as this simple example shows, the current work around requires code duplication.
Proposed Solution
Extend the role of the reduce
operator in pipelines (in particular, make it non-terminal). More specifically:
- introduce
as
inreduce
statement to specify the variables in which to store values for the next stage of the pipeline, - introduce
@group($var, ...)
annotation as a way to specify variables by which to group the incoming stream before reducing each group.
With this, the previous example becomes:
match
$country isa country, has name "Australia";
$post isa post;
location_check($post, $country) == true;
(post: $post, tag: $tag) isa tagging;
reduce @group($tag) count($post) as $count;
fetch:
"tag": $tag.text;
"post count": $count;
Additional Information
References: https://duckdb.org/2022/03/07/aggregate-hashtable.html https://www.ibm.com/docs/en/psfa/7.1.0?topic=functions-grouped-aggregates