ksql
ksql copied to clipboard
Clearer behavior when using UDAFs with *
Currently, using *
in a UDAF defaults to the ROWTIME
column. For example, if you run:
CREATE STREAM test (ID STRING) WITH (KAFKA_TOPIC='test', PARTITIONS=1, VALUE_FORMAT='JSON');
CREATE TABLE test_result AS SELECT ID, MAX(*) FROM test GROUP BY ID;
INSERT INTO test VALUES ('hello');
INSERT INTO test VALUES ('hello');
INSERT INTO test VALUES ('hello');
INSERT INTO test VALUES ('test');
Sample result:
SELECT * FROM TEST_RESULT;
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
|ID |KSQL_COL_0 |
+------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
|hello |1658336055224 |
|test |1658336060177 |
If you try to use a UDAF that does not support longs, ksql fails to resolve the function. This error message might be confusing to users if they don't realize that it's using ROWTIME
automatically. I deleted the avg
function for longs for the sake of testing:
CREATE TABLE test_result AS SELECT ID, AVG(*) FROM test GROUP BY ID;
Could not determine output schema for query due to error: Function 'avg' does not accept parameters (BIGINT).
- Is this behavior guaranteed to users, or is this undefined behavior?
- Should this behavior be documented somewhere?
- Should
*
be disallowed for most functions? This is what Postgres and MySQL do.
cc: @jzaralim
Piggybacking off of this issue - calling a UDAF with empty args does the same thing. AVG()
takes the average of ROWTIME
.
@JimGalasyn , can we update the documentation with this explicit behavior for now ?
I think we should at least figure out how we want *
to behave in the future. Can we agree on the following fixes in the longer run?
-
*
should be disallowed for most functions. - Empty args should be disallowed in all functions. None of our existing UDAFs are documented to support empty args. Functions like
COUNT
must be called with a*
. -
*
should refer to the entire record (very long run)