Key missing from projection.

Hello. Recently, we were dealing with joining streams together and creating aggregations from them. Upon this process, we realized we cannot create streams from CREATE STREAM AS SELECT expressions and use aggregations (GROUP BY, HAVING clauses). First question is:

Is there a future plan to use aggregations on persistent queries that create streams (CREATE STREAM AS SELECT expression maybe)

Then we leaned upon using TABLES instead of streams since they can be used with aggregation functions, and have the flexibility of joining streams etc. But when we are fiddling with tables and such, we came across with a problem. It is given below.

Problem

Assume we have a stream stream_1. We want to create a table table_1 such that counts the number of entries on the stream, grouped by some column. An example query would be:

CREATE TABLE table_1 AS
	SELECT
		COUNT(*)
	FROM stream_1
	GROUP BY col_1
	EMIT CHANGES;

Let me remind you when we run SELECT COUNT(*) FROM stream_1 GROUP BY col_1 EMIT CHANGES; it runs smoothly, and provides an expected outcome. However, when we try to CREATE TABLE table_1 AS expression, we get the following error:

Could not determine output schema for query due to error: Key missing from projection.
The query used to build `TABLE_1` must include the grouping expression COL_1 in its projection.
Statement: CREATE TABLE TABLE_1 WITH (KAFKA_TOPIC='TABLE_1', PARTITIONS=1, REPLICAS=1) AS SELECT COUNT(*) KSQL_COL_0
FROM STREAM_1 STREAM_1
GROUP BY STREAM_1.COL_1
EMIT CHANGES;

For sake of privacy, I modified column and stream/table names. What are we doing wrong here? We would be blissful if you guys share your wisdom.

Most regards, Emre

Jul 20 '22 11:07 emrekuecuek

#8534 might be related but I could not be 100% sure.

Jul 20 '22 11:07 emrekuecuek

Hello @emrekuecuek - thanks for reporting this! ksqlDB currently does not support "keyless tables" (every table must have a named key field). @nateab and @mjsax have some ideas about how we can lift this limitation, but for now all you have to do is also include the key in your projection:

CREATE TABLE table_1 AS
	SELECT
                col_1,
		COUNT(*)
	FROM stream_1
	GROUP BY col_1
	EMIT CHANGES;

As for your other question:

Is there a future plan to use aggregations on persistent queries that create streams (CREATE STREAM AS SELECT expression maybe)

An aggregation on a stream always produces a table - the semantics of a stream mean that each individual event stands alone whereas for a table new events "replace" old events with the same key. For the case of an aggregation, the latter is the case.

Jul 20 '22 17:07 agavra

ksql
ksql copied to clipboard