ksql
ksql copied to clipboard
ksql-test-runner fails when joining to a stream created as a join.
When we are trying to join multiple a stream and multiple tables together, we get an error in the KSQL test harness.
The following code produces an error:
Test failed: unknown topic: TEST_LOOKUP1_OUTPUT
This seems to happen no matter what input or output is provided.
CREATE STREAM test (
id STRING,
lookup1 STRING,
lookup2 STRING)
WITH (VALUE_FORMAT='JSON',
KAFKA_TOPIC = 'test',
KEY = 'id');
CREATE TABLE lookup1 (
id STRING,
value STRING)
WITH (VALUE_FORMAT='JSON',
KAFKA_TOPIC = 'lookup1',
KEY = 'id');
CREATE TABLE lookup2 (
id STRING,
value STRING)
WITH (VALUE_FORMAT='JSON',
KAFKA_TOPIC = 'lookup2',
KEY = 'id');
CREATE STREAM test_lookup1_output
AS
SELECT t.id AS id,
l1.value AS lookup1_value,
t.lookup2 AS lookup2
FROM test t
JOIN lookup1 l1
ON t.lookup1 = l1.id;
CREATE STREAM test_lookup2_output
AS
SELECT t.id AS id,
t.lookup1_value AS loo,
l2.value AS lookup2_value
FROM test_lookup1_output t
JOIN lookup2 l2
ON t.lookup2 = l2.id;
For unknown reason, sometimes it works in JSON format, sometimes it works in AVRO format. I will try both to find a solution for my case.
@hjafarpour can you take a look?
As a workaround, create your streams separately and insert into them rather than using CREATE AS
:
In the example @big-andy-coates provided in #5314, changing this line:
create stream s2 as select * from s1;
to this:
create stream s2 (id string, val string) with (kafka_topic='s2', value_format='json', partitions=1);
insert into s2 select * from s1;
makes the error go away.
That workaround has the limitation of requiring the developer to specify a partitions count. CREATE STREAM AS SELECT will automatically assign partition count based on the target(s) partition count. That's convenient as if you have, for example:
- one or more CREATE SOURCE statements which produce connectors -- the topics generated inherit whatever the topic configuration defaults are for partition count
- and then you have CREATE STREAM from those topics and then additional CSAS to transform/join
you avoid ever needing to hard code a partition count.
Instead this workaround, while it has the benefit of allowing one to use ksql-test-runner
without getting tripped up by the bug listed on this issue, comes with the cost of having to hard code partition counts. It would be great if CREATE STREAM
allowed omitting partition count (thereby inheriting some default), just like CREATE SOURCE
.
No plan to address this?
+1 please fix. Than you.
I added the needs-triage tag so that we'll look at this issue again this week.