[SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function
This patch converts countDistinct() from a non-parameterized builtin to a parameterized builtin function to allow for 1 new parameter: dir for direction. The value of dir can be r and c, denoting row-wise and column-wise aggregation respectively. This patch only implements CP and the SP case will throw a NotImplementedException()- the latter case will be addressed in a subsequent patch.
@phaniarnab @Baunsgaard @j143, would love your feedback!
I will look into this @BACtaki. Meanwhile, please check why the tests are failing.
I will look into this @BACtaki. Meanwhile, please check why the tests are failing.
Thanks @phaniarnab ! Ack, let me look into the test failures.
I like the PR, i do not see in the code why the tests would fail so i will force them to restart,
Thanks for reviewing @Baunsgaard and @phaniarnab ! Re the test failures, the following careless edit was the culprit:
@@ -2361,10 +2375,6 @@ public class DMLTranslator
case SUM:
case PROD:
case VAR:
- case COUNT_DISTINCT:
- currBuiltinOp = new AggUnaryOp(target.getName(), DataType.SCALAR, target.getValueType(),
- AggOp.valueOf(source.getOpCode().name()), Direction.RowCol, expr);
- break;
is the method supposed to be: countDistinctCol(X) or countDistinct(X, "c") Personally i would prefer if we had both, maybe @mboehm7 have an opinion?
That is a great idea! I have added countDistinctRow and countDistinctCol aliases for countDistinct(X, dir="r") and countDistinct(X, dir="c") respectively.
Thanks for the PR, while merging i changed a wildcard to individual imports. we should avoid wildcard imports.