dozer
dozer copied to clipboard
Support for `IN` clause in streaming SQL
Description
Dozer streaming SQL does not currently support the IN
operator. We need to extend support for IN
with the following formats:
Lookup from a static list of values
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1, value2, ...);
This is pretty straightforward. Whenever an Operation
is received, the value must be looked up from the list of static values provided in the SQL.
The behavior is similar to an equality condition. The following table explains how the WHERE
operator handles messages:
-
INSERT
: The insert message is propagated if the WHERE condition is matched -
UPDATE
: The previous and new values must be evaluated against the condition:- If both the previous value and current value do not match the condition, no action is taken
- If both the previous value and current value match, the UPDATE is propagated downstream
- If the previous value does not match, but the new values match, an INSERT is propagated downstream
- If the previous value matches, but the new value does not match, a DELETE is propagated downstream
-
DELETE
: The insert message is propagated if the WHERE condition is matched
Lookup from an inner SELECT
:
SELECT column_name(s)
FROM table_name
WHERE column_name IN (SELECT STATEMENT);
This case is more complex as the values of the inner SELECT can change. This can follow a similar behavior of a JOIN operator. Whenever an INSERT message is received for the inner SELECT, for example, a lookup of the parent table must be performed to emit the values that were not previously matched. This behavior is the same one implemented in the JOIN. For such a case, the DAG should be constructed using a JOIN operator.
@mediuminvader can provide more details about the JOIN implementation
I will give it a shot.
/bounty $600
💎 $600 bounty • Dozer Data
Steps to solve:
-
Start working: Comment
/attempt #1659
with your implementation plan -
Submit work: Create a pull request including
/claim #1659
in the PR body to claim the bounty - Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts
Additional opportunities:
-
🔴 Livestream on Algora TV while solving this bounty & earn $200 upon merge! Make sure to have your camera and microphone on. Comment
/livestream
once live
Thank you for contributing to getdozer/dozer!
Add a bounty • Share on socials
Attempt | Started (GMT+0) | Solution |
---|---|---|
🔴 @hi-rustin | Jun 21, 2023, 5:42:00 AM | WIP |
🔴 @debaa98 | Aug 10, 2023, 3:24:37 PM | WIP |
🟢 @uzmi1 | Oct 28, 2023, 4:20:30 PM | WIP |
🔴 @Rutik7066 | Dec 30, 2023, 8:22:56 AM | WIP |
🟢 @varshith257 | May 17, 2024, 8:09:33 PM | WIP |
🔴 @ydv129 | Jun 23, 2024, 3:49:08 AM | WIP |
Note: this has already been taken up by @hi-rustin
/attempt #1659
Lookup from an inner
SELECT
:SELECT column_name(s) FROM table_name WHERE column_name IN (SELECT STATEMENT);
This case is more complex as the values of the inner SELECT can change. This can follow a similar behavior of a JOIN operator. Whenever an INSERT message is received for the inner SELECT, for example, a lookup of the parent table must be performed to emit the values that were not previously matched. This behavior is the same one implemented in the JOIN. For such a case, the DAG should be constructed using a JOIN operator.
@mediuminvader Could you please give some tips about how to implement it?
InSubquery is a SqlExpr. We will get a sqlparser::ast::Query from it, but I don't how to handle it in SelectionProcessor. How to rewrite a sub query as a join here?
Hi @hi-rustin, to support a SubQuery I think we can go 2 ways:
- the first one is to build the graph accordingly, allowing to plug a sub query to the SelectionProcessor, somehow it's happening in the ProductProccessor, if we have something like
SELECT fields FROM (SELECT STATEMENT)
the subquery is parsed here and then graph output is plugged to the ProductProcessor input. Need something similar for the SelectionProcessor that currently doesn't support multiple inputs. To summarize this need to work on pipeline builder side with some changes on SelectionProcessor. - Another option is to rewrite the query, but that should happen in an early stage. We don't have that stage yet, but extending the Planner might be good enough to achieve this.
Hope this helps, but feel free to contact me for any question.
Can I still make an attempt at this?
Hey @snork-alt, is this still open?
Hey @Ahtritus select with IN from constant vales had been implemented but IN with nested select had not.
Note: The user @debaa98 is already attempting to complete issue #1659 and claim the bounty. If you attempt to complete the same issue, there is a chance that @debaa98 will complete the issue first, and be awarded the bounty. We recommend discussing with @debaa98 and potentially collaborating on the same solution versus creating an alternate solution.
Hey @snork-alt May i work on this.
/attempt #1659
Algora profile | Completed bounties | Tech | Active attempts | Options |
---|---|---|---|---|
@varshith257 | 1 bounty from 1 project | TypeScript, Go |
Cancel attempt |