ksql icon indicating copy to clipboard operation
ksql copied to clipboard

N-way table/table/table join

Open CiroDiMarzo opened this issue 3 years ago • 5 comments

Is your feature request related to a problem? Please describe. It seems to me there is no way to join three tables when their primary key is not the same for all of them. I have three tables related as illustrated below.

USER:
- UserId: PK

REVIEWER:
- ReviewerId: PK
- UserId: FK ---> USER: UserId

EMAILADDRESS:
- EmailAddressId: PK
- UserId: FK ---> USER: UserId

I follow the example illustrated in this guide: https://docs.ksqldb.io/en/latest/developer-guide/joins/join-streams-and-tables/#n-way-joins

Of course the example above is different, because is involves streams, which don't have the same requirements for joins as tables. Anyway, I tried this all the same and of course it resulted in an error:

CREATE OR TABLE `reviewer-email-user`
    AS SELECT *
    FROM USER
        INNER JOIN REVIEWER ON USER.USERID = REVIEWER.USERID
        INNER JOIN EMAILADDRESS ON  USER.USERID = EMAILADDRESS.USERID
EMIT CHANGES;

The error is, as you would expect:

Could not determine output schema for query due to error: Invalid join condition: table-table joins require to join on the primary key of the right input table. Got USER.USERID = REVIEWER.USERID

As a second option, I tried to join the first two tables and then I would join the result with the third.

CREATE OR TABLE `reviewer-user`
    AS SELECT *
    FROM REVIEWER
        INNER JOIN REVIEWER ON REVIEWER.USERID = USER.USERID
EMIT CHANGES;

The above works, however the resulting table has for PK that which was the PK of the left table, which is ReviewerId. So, now the result table cannot be joined with the third table, EMAILADDRESS.

Describe the solution you'd like I would like to join three tables when:

  • TABLE 1 has a given PK
  • TABLE 2 and 3 have a FK pointing to the PK of TABLE 1

Describe alternatives you've considered I was lucky because, in my very specific case, EMAILADDRESS had another column, "IsPrimary". When this column is true then there is a 1:1 relationship between EmailAddressId and UserId. So I could repartition EMAILADDRESS by UserId, thus enabling the join between the table resulting from the REVIEWER-USER join and EMAILADDRESS (as the right-hand table in the join).

CiroDiMarzo avatar Jul 27 '22 06:07 CiroDiMarzo

Hi @CiroDiMarzo !

Unfortunately, we don't yet support N-way foreign key joins which is what you need in this case. It is on the roadmap though and we are planning to add support for it. In the meantime, your alternative solution is your best bet :)

vpapavas avatar Aug 16 '22 19:08 vpapavas

Thank you @vpapavas

But I have another question.

Is this feature missing in the underlying Kafka Stream library? Or is it just ksqlDB that do not support N-way foreign key joins while Kafka Stream would allow it, if I were to use it.

CiroDiMarzo avatar Aug 17 '22 09:08 CiroDiMarzo

May I know the tentative timeline of this feature (we are converting our DW using ksql and this is constant pain point) ? Thanks

dttouchdata avatar Aug 26 '22 19:08 dttouchdata

Any news on this? Where can we see the roadmap for KSQL - I am also very interested in this feature?

IainAdamsLabs avatar May 30 '23 11:05 IainAdamsLabs

Is there any sort of estimate\projection on when this feature will be added to ksqlDB?

burkematthew avatar Feb 26 '24 22:02 burkematthew