dozer icon indicating copy to clipboard operation
dozer copied to clipboard

Support for `IN` clause in streaming SQL

Open snork-alt opened this issue 1 year ago • 17 comments

Description

Dozer streaming SQL does not currently support the IN operator. We need to extend support for IN with the following formats:

Lookup from a static list of values

SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1, value2, ...);

This is pretty straightforward. Whenever an Operation is received, the value must be looked up from the list of static values provided in the SQL.

The behavior is similar to an equality condition. The following table explains how the WHERE operator handles messages:

  • INSERT: The insert message is propagated if the WHERE condition is matched
  • UPDATE: The previous and new values must be evaluated against the condition:
    • If both the previous value and current value do not match the condition, no action is taken
    • If both the previous value and current value match, the UPDATE is propagated downstream
    • If the previous value does not match, but the new values match, an INSERT is propagated downstream
    • If the previous value matches, but the new value does not match, a DELETE is propagated downstream
  • DELETE: The insert message is propagated if the WHERE condition is matched

Lookup from an inner SELECT:

SELECT column_name(s)
FROM table_name
WHERE column_name IN (SELECT STATEMENT);

This case is more complex as the values of the inner SELECT can change. This can follow a similar behavior of a JOIN operator. Whenever an INSERT message is received for the inner SELECT, for example, a lookup of the parent table must be performed to emit the values that were not previously matched. This behavior is the same one implemented in the JOIN. For such a case, the DAG should be constructed using a JOIN operator.

@mediuminvader can provide more details about the JOIN implementation

snork-alt avatar Jun 20 '23 17:06 snork-alt

I will give it a shot.

Rustin170506 avatar Jun 21 '23 05:06 Rustin170506

/bounty $600

snork-alt avatar Jun 21 '23 09:06 snork-alt

💎 $600 bounty • Dozer Data

Steps to solve:

  1. Start working: Comment /attempt #1659 with your implementation plan
  2. Submit work: Create a pull request including /claim #1659 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Additional opportunities:

  • 🔴 Livestream on Algora TV while solving this bounty & earn $200 upon merge! Make sure to have your camera and microphone on. Comment /livestream once live

Thank you for contributing to getdozer/dozer!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🔴 @hi-rustin Jun 21, 2023, 5:42:00 AM WIP
🔴 @debaa98 Aug 10, 2023, 3:24:37 PM WIP
🟢 @uzmi1 Oct 28, 2023, 4:20:30 PM WIP
🔴 @Rutik7066 Dec 30, 2023, 8:22:56 AM WIP
🟢 @varshith257 May 17, 2024, 8:09:33 PM WIP
🔴 @ydv129 Jun 23, 2024, 3:49:08 AM WIP

algora-pbc[bot] avatar Jun 21 '23 09:06 algora-pbc[bot]

Note: this has already been taken up by @hi-rustin

snork-alt avatar Jun 21 '23 09:06 snork-alt

/attempt #1659

Rustin170506 avatar Jun 21 '23 16:06 Rustin170506

Lookup from an inner SELECT:

SELECT column_name(s)
FROM table_name
WHERE column_name IN (SELECT STATEMENT);

This case is more complex as the values of the inner SELECT can change. This can follow a similar behavior of a JOIN operator. Whenever an INSERT message is received for the inner SELECT, for example, a lookup of the parent table must be performed to emit the values that were not previously matched. This behavior is the same one implemented in the JOIN. For such a case, the DAG should be constructed using a JOIN operator.

@mediuminvader Could you please give some tips about how to implement it?

InSubquery is a SqlExpr. We will get a sqlparser::ast::Query from it, but I don't how to handle it in SelectionProcessor. How to rewrite a sub query as a join here?

Rustin170506 avatar Jul 03 '23 10:07 Rustin170506

Hi @hi-rustin, to support a SubQuery I think we can go 2 ways:

  • the first one is to build the graph accordingly, allowing to plug a sub query to the SelectionProcessor, somehow it's happening in the ProductProccessor, if we have something like SELECT fields FROM (SELECT STATEMENT) the subquery is parsed here and then graph output is plugged to the ProductProcessor input. Need something similar for the SelectionProcessor that currently doesn't support multiple inputs. To summarize this need to work on pipeline builder side with some changes on SelectionProcessor.
  • Another option is to rewrite the query, but that should happen in an early stage. We don't have that stage yet, but extending the Planner might be good enough to achieve this.

Hope this helps, but feel free to contact me for any question.

mediuminvader avatar Jul 03 '23 14:07 mediuminvader

Can I still make an attempt at this?

crajcan avatar Jul 27 '23 15:07 crajcan

/attempt #1659

Options

debaa98 avatar Aug 10 '23 15:08 debaa98

Hey @snork-alt, is this still open?

Ahtritus avatar Sep 14 '23 06:09 Ahtritus

Hey @Ahtritus select with IN from constant vales had been implemented but IN with nested select had not.

snork-alt avatar Sep 14 '23 07:09 snork-alt

/attempt #1659

Options

uzmi1 avatar Oct 28 '23 16:10 uzmi1

Note: The user @debaa98 is already attempting to complete issue #1659 and claim the bounty. If you attempt to complete the same issue, there is a chance that @debaa98 will complete the issue first, and be awarded the bounty. We recommend discussing with @debaa98 and potentially collaborating on the same solution versus creating an alternate solution.

algora-pbc[bot] avatar Oct 28 '23 16:10 algora-pbc[bot]

Hey @snork-alt May i work on this.

Rutik7066 avatar Dec 13 '23 15:12 Rutik7066

/attempt #1659

Options

Rutik7066 avatar Dec 30 '23 08:12 Rutik7066

/attempt #1659

Algora profile Completed bounties Tech Active attempts Options
@varshith257 1 bounty from 1 project
TypeScript, Go
Cancel attempt

varshith257 avatar May 17 '24 20:05 varshith257

/attempt #1659

Options

ydv129 avatar Jun 23 '24 03:06 ydv129