ksql
ksql copied to clipboard
Key missing from projection error when creating table
Key missing from projection.
Hello. Recently, we were dealing with joining streams together and creating aggregations from them. Upon this process, we realized we cannot create streams from CREATE STREAM AS SELECT
expressions and use aggregations (GROUP BY
, HAVING
clauses). First question is:
- Is there a future plan to use aggregations on persistent queries that create streams (
CREATE STREAM AS SELECT
expression maybe)
Then we leaned upon using TABLES
instead of streams since they can be used with aggregation functions, and have the flexibility of joining streams etc. But when we are fiddling with tables and such, we came across with a problem. It is given below.
Problem
Assume we have a stream stream_1
. We want to create a table table_1
such that counts the number of entries on the stream, grouped by some column. An example query would be:
CREATE TABLE table_1 AS
SELECT
COUNT(*)
FROM stream_1
GROUP BY col_1
EMIT CHANGES;
Let me remind you when we run SELECT COUNT(*) FROM stream_1 GROUP BY col_1 EMIT CHANGES;
it runs smoothly, and provides an expected outcome. However, when we try to CREATE TABLE table_1 AS
expression, we get the following error:
Could not determine output schema for query due to error: Key missing from projection.
The query used to build `TABLE_1` must include the grouping expression COL_1 in its projection.
Statement: CREATE TABLE TABLE_1 WITH (KAFKA_TOPIC='TABLE_1', PARTITIONS=1, REPLICAS=1) AS SELECT COUNT(*) KSQL_COL_0
FROM STREAM_1 STREAM_1
GROUP BY STREAM_1.COL_1
EMIT CHANGES;
For sake of privacy, I modified column and stream/table names. What are we doing wrong here? We would be blissful if you guys share your wisdom.
Most regards, Emre
#8534 might be related but I could not be 100% sure.
Hello @emrekuecuek - thanks for reporting this! ksqlDB currently does not support "keyless tables" (every table must have a named key field). @nateab and @mjsax have some ideas about how we can lift this limitation, but for now all you have to do is also include the key in your projection:
CREATE TABLE table_1 AS
SELECT
col_1,
COUNT(*)
FROM stream_1
GROUP BY col_1
EMIT CHANGES;
As for your other question:
Is there a future plan to use aggregations on persistent queries that create streams (CREATE STREAM AS SELECT expression maybe)
An aggregation on a stream always produces a table - the semantics of a stream mean that each individual event stands alone whereas for a table new events "replace" old events with the same key. For the case of an aggregation, the latter is the case.