ksql icon indicating copy to clipboard operation
ksql copied to clipboard

ksql-test-runner fails when joining to a stream created as a join.

Open jeff-goddard opened this issue 5 years ago • 7 comments

When we are trying to join multiple a stream and multiple tables together, we get an error in the KSQL test harness.

The following code produces an error:

Test failed: unknown topic: TEST_LOOKUP1_OUTPUT

This seems to happen no matter what input or output is provided.

CREATE STREAM test (
       id STRING,
       lookup1 STRING,
       lookup2 STRING)
  WITH (VALUE_FORMAT='JSON',
       KAFKA_TOPIC = 'test',
       KEY = 'id');
       
CREATE TABLE lookup1 (
       id STRING,
       value STRING)
  WITH (VALUE_FORMAT='JSON',
       KAFKA_TOPIC = 'lookup1',
       KEY = 'id');

CREATE TABLE lookup2 (
       id STRING,
       value STRING)
  WITH (VALUE_FORMAT='JSON',
       KAFKA_TOPIC = 'lookup2',
       KEY = 'id');

CREATE STREAM test_lookup1_output
    AS 
SELECT t.id AS id,
       l1.value AS lookup1_value,
       t.lookup2 AS lookup2
  FROM test t
  JOIN lookup1 l1
    ON t.lookup1 = l1.id;

 CREATE STREAM test_lookup2_output
    AS 
SELECT t.id AS id,
       t.lookup1_value AS loo,
       l2.value AS lookup2_value
  FROM test_lookup1_output t
  JOIN lookup2 l2
    ON t.lookup2 = l2.id;

jeff-goddard avatar Aug 07 '19 13:08 jeff-goddard

For unknown reason, sometimes it works in JSON format, sometimes it works in AVRO format. I will try both to find a solution for my case.

xs005 avatar Aug 09 '19 14:08 xs005

@hjafarpour can you take a look?

apurvam avatar Aug 12 '19 18:08 apurvam

As a workaround, create your streams separately and insert into them rather than using CREATE AS:

In the example @big-andy-coates provided in #5314, changing this line:

create stream s2 as select * from s1;

to this:

create stream s2 (id string, val string) with (kafka_topic='s2', value_format='json', partitions=1);
insert into s2 select * from s1;

makes the error go away.

panasenco avatar Jun 19 '20 17:06 panasenco

That workaround has the limitation of requiring the developer to specify a partitions count. CREATE STREAM AS SELECT will automatically assign partition count based on the target(s) partition count. That's convenient as if you have, for example:

  • one or more CREATE SOURCE statements which produce connectors -- the topics generated inherit whatever the topic configuration defaults are for partition count
  • and then you have CREATE STREAM from those topics and then additional CSAS to transform/join

you avoid ever needing to hard code a partition count.

Instead this workaround, while it has the benefit of allowing one to use ksql-test-runner without getting tripped up by the bug listed on this issue, comes with the cost of having to hard code partition counts. It would be great if CREATE STREAM allowed omitting partition count (thereby inheriting some default), just like CREATE SOURCE.

benissimo avatar Aug 07 '20 10:08 benissimo

No plan to address this?

colebaileygit avatar Aug 23 '21 13:08 colebaileygit

+1 please fix. Than you.

CiroDiMarzo avatar Aug 12 '22 09:08 CiroDiMarzo

I added the needs-triage tag so that we'll look at this issue again this week.

jnh5y avatar Aug 15 '22 14:08 jnh5y