morphir-elm
morphir-elm copied to clipboard
Support grouping by multiple columns in Morphir SDK Aggregations
Morphir.SDK.Aggregate.groupBy
takes an argument called getKey
to specify the columns to group by.
This can be either:
- a single fieldFunction (implemented)
- A tuple of keys constructed with the
keyN
function (fromMorphir.SDK.Key
), whereN
is between2
and16
.
Implementing this will involve:
- Create an example that groups by multiple keys.
- This will likely also involve working out what
key
as provided to the lambda we pass into aggregate (i.e.\key inputs ->
) actually is in that case, and how to use it. - in Spark, grouping by multiple columns causes the columns to be repeated in the output as separate columns, is the same true in Morphir SDK Aggregations?
- This will likely also involve working out what
- Change the
AggregationCall
type to store a list of Names for Group Key, and probably a List of Maybe Names for Returned Group Key. - Change the
constructAggregationCall
function to parse the keyFields as a single field or a keyN successfully. (perhaps needing to change under what circumstances we restrict the number of keyFields) - Extend the
Aggregate
ObjectExpression to take a List of Strings for its columns to group by, with corresponding changes inobjectExpressionFromAggregationCall
andmapObjectExpressionToScala
, andMorphir.Spark.API.aggregate
- Write tests to cover the new example
- Update documentation to describe what's now supported and what the output looks like.
To expand on this a bit:
The first step is to familiarise oneself with how keyN works in groupBy
and aggregate
.
For a list of Antiques, 'source'
source
|> groupBy (key2 .category .product)
|> aggregate
(\ key inputs ->
...
)
Find out what 'key' is, and how to create named columns from it.
i.e. with input data like
category product ageofItem ...
HouseHoldCollection Furniture 10.0 ...
HouseHoldCollection Furniture 12.0 ...
HouseHoldCollection Plates 20.0 ...
HouseHoldCollection Plates 22.0 ...
PaintCollections Paintings 50.0 ...
PaintCollections Paintings 52.0 ...
How do you get
category product oldest
HouseHoldCollection Furniture 12.0
HouseHoldCollection Plates 22.0
PaintCollections Paintings 52.0
Existing examples that make use of groupBy
and aggregate
can be found in tests-integration/spark/model/src/SparkTests/AggregationTests.elm
.
AggregationCall
and constructAggregationCall
can be found in src/Morphir/SDK/Aggregate.elm
.
The purpose of "group key" vs. "returned group key" is highlighted in https://github.com/finos/morphir-elm/issues/799#issuecomment-1191714998 the gist of it is that in elm, when you do
testDataSet
|> groupBy .key1
|> aggregate
(\key inputs ->
{ key = key
, count = inputs (count |> withFilter (\a -> a.value < 7))
, sum = inputs (sumOf .value)
, max = inputs (maximumOf .value)
, min = inputs (minimumOf .value)
}
)
that creates a field named "key" from a field that was named "key1". the group key is "key1", while the returned group key is "key". This work will be to handle multiple group keys. We currently ignore the returned group key (see #842 for the task to implement it).
I created a PR with all my extant work on this at https://github.com/finos/morphir-elm/pull/911