dice icon indicating copy to clipboard operation
dice copied to clipboard

Add JSON support to WHERE clause evaluation in Executor

Open JyotinderSingh opened this issue 1 year ago • 19 comments

We currently support string, integer, and float comparisons in the DSQL executor (core/executor.go). We need to support JSON fields as part of WHERE clause expressions.

Requirements

Syntax support

  • Support $value.field syntax in DSQL queries.
  • Example
SELECT $key, $value FROM `pattern?` WHERE $value.field > 10;
  • Support nested field queries, such as $value.field1.field2.field3.

Testing

  • Add comprehensive unit tests and integration tests to verify the behavior.
  • benchmark the code and measure the time taken and memory allocs using benchmem and try to keep them to the bare minimum.
  • Include screenshots of the benchmark as a part of the PR description.

Notes

This feature can be worked on once #165 is completed.

JyotinderSingh avatar Jul 30 '24 12:07 JyotinderSingh

@JyotinderSingh can I work on this one as well ?

vaibh1297 avatar Jul 31 '24 04:07 vaibh1297

@JyotinderSingh can I work on this one as well ?

Will assign once the other issue assigned to you is completed. In case someone wants to take this up in the meantime they are free to do so.

JyotinderSingh avatar Jul 31 '24 04:07 JyotinderSingh

Hey @JyotinderSingh can I pick up this task

YahyaHaq avatar Jul 31 '24 12:07 YahyaHaq

Hey @JyotinderSingh can I pick up this task

Assigned

JyotinderSingh avatar Jul 31 '24 12:07 JyotinderSingh

@arpitbbhayani @JyotinderSingh While working on this task I came across a couple of issues I wanted to discuss

  1. Problem: We have a bug in our QWATCH command. We can store a value as an integer but on retrieval it treats it as a string. Running this query in qwatch_test.go var qWatchQuery = "SELECT $key, $value FROM 'match:100:*' WHERE $value > 1 ORDER BY $value DESC LIMIT 3" will give us an error "2024/08/03 12:43:48 ERRO incompatible types in comparison: string and int". Solution: I can create a fix which will ensure all values stored will be retrieved with the same type at storage time. I can create this fix and add extensive tests within this PR or I can create a new PR for this fix. I believe its important to fix this before moving onto filtering based on JSON
  2. There is an issue with sqlparser. It isn't compatible with our style of querying JSON. When I pass this query SELECT $key, $value FROM 'match:100:*' WHERE $value.field1.field2.field3 > 1 ORDER BY $value DESC LIMIT 3 it gives me this error on parsing the SQL statement "error parsing SQL statement: syntax error at position 67". This error does disappear if I limit my query to two nested fields rather than 3 but then the issue arises that it ignores $value and considers field2 to be our column name. Solution: We can follow the convention present in PostgresSQL and query json using ->. this will transform our query into this format: SELECT $key, $value FROM 'match:100:*' WHERE $value->field1->field2->field3 > 1 ORDER BY $value DESC LIMIT 3. What are your opinions on this approach

YahyaHaq avatar Aug 03 '24 08:08 YahyaHaq

@arpitbbhayani @JyotinderSingh While working on this task I came across a couple of issues I wanted to discuss

  1. Problem: We have a bug in our QWATCH command. We can store a value as an integer but on retrieval it treats it as a string. Running this query in qwatch_test.go var qWatchQuery = "SELECT $key, $value FROM 'match:100:*' WHERE $value > 1 ORDER BY $value DESC LIMIT 3" will give us an error "2024/08/03 12:43:48 ERRO incompatible types in comparison: string and int".

Solution: I can create a fix which will ensure all values stored will be retrieved with the same type at storage time. I can create this fix and add extensive tests within this PR or I can create a new PR for this fix. I believe its important to fix this before moving onto filtering based on JSON

  1. There is an issue with sqlparser. It isn't compatible with our style of querying JSON. When I pass this query SELECT $key, $value FROM 'match:100:*' WHERE $value.field1.field2.field3 > 1 ORDER BY $value DESC LIMIT 3 it gives me this error on parsing the SQL statement "error parsing SQL statement: syntax error at position 67". This error does disappear if I limit my query to two nested fields rather than 3 but then the issue arises that it ignores $value and considers field2 to be our column name.

Solution: We can follow the convention present in PostgresSQL and query json using ->. this will transform our query into this format: SELECT $key, $value FROM 'match:100:*' WHERE $value->field1->field2->field3 > 1 ORDER BY $value DESC LIMIT 3. What are your opinions on this approach

Thanks for the extensive details @YahyaHaq! Please go ahead with including the fix for (1) in this PR. Secondly, the arrow convention is a good approach for nested field access. Please go ahead with it.

Thanks a lot for catching these issues and suggesting fixes!

JyotinderSingh avatar Aug 03 '24 08:08 JyotinderSingh

@arpitbbhayani @JyotinderSingh A few updates regarding my last message:

Regarding (2) I just confirmed that passing in query in this format SELECT _key, _value FROM 'match:100:*' WHERE _value->field1 = '11' ORDER BY _value DESC LIMIT 3 does not work either for SQLParser. I receive the error "panic: syntax error at position 60 near 'field1'". I can create an issue on github and see if I can do anything to resolve it myself and in the meantime I can also check if there is any other SQLParser that will suit us better. If there is any other solution you can suggest that may provide a faster fix then that would be great but until this is resolved I will not be able to complete this ticket

YahyaHaq avatar Aug 03 '24 11:08 YahyaHaq

@arpitbbhayani @JyotinderSingh

A few updates regarding my last message:

Regarding (2) I just confirmed that passing in query in this format SELECT _key, _value FROM 'match:100:*' WHERE _value->field1 = '11' ORDER BY _value DESC LIMIT 3 does not work either for SQLParser. I receive the error "panic: syntax error at position 60 near 'field1'". I can create an issue on github and see if I can do anything to resolve it myself and in the meantime I can also check if there is any other SQLParser that will suit us better. If there is any other solution you can suggest that may provide a faster fix then that would be great but until this is resolved I will not be able to complete this ticket

You should replace _value with $value and _key with $key. But this issue does not seem related to that. Let me think over it. In the mean time you can redirect this PR to only fix (1) and we can create a new PR for (2) after appropriate discussion.

JyotinderSingh avatar Aug 03 '24 11:08 JyotinderSingh

in our code we replace $value with _value before passing it into the parser

YahyaHaq avatar Aug 03 '24 11:08 YahyaHaq

Regarding (1)

  • I think I should also be assigned the issue https://github.com/DiceDB/dice/issues/188 as if i work on this issue my changes will be overwriting the integration tests in this file. If the person assigned to this hasn't created the PR then I can work on this otherwise I can work on top of their PR. What do you think

YahyaHaq avatar Aug 03 '24 11:08 YahyaHaq

@YahyaHaq Let's do one thing, to support the JSONPath syntax let's require the user to pass the JSONPath surrounded by backticks, as follows:

SELECT $key, $value FROM 'match:100:*' WHERE `$value.field1.field2.field3` > 1 ORDER BY $value DESC LIMIT 3

The parser should be able to handle this. How does that sound?

JyotinderSingh avatar Aug 03 '24 16:08 JyotinderSingh

Regarding (1)

  • I think I should also be assigned the issue Add integration test for testing WHERE clause in QWATCH. #188 as if i work on this issue my changes will be overwriting the integration tests in this file. If the person assigned to this hasn't created the PR then I can work on this otherwise I can work on top of their PR. What do you think

Let's let it be for now, we can merge whichever PR gets ready first and then work from there.

JyotinderSingh avatar Aug 03 '24 16:08 JyotinderSingh

@JyotinderSingh Thank you so much for that SQL solution. This works great!

YahyaHaq avatar Aug 04 '24 14:08 YahyaHaq

@JyotinderSingh Thank you so much for that SQL solution. This works great!

Feel free to raise a draft PR, I can provide some ongoing feedback.

JyotinderSingh avatar Aug 04 '24 14:08 JyotinderSingh

Hi @YahyaHaq Any updates on this issue?

JyotinderSingh avatar Aug 07 '24 14:08 JyotinderSingh

Hey @JyotinderSingh apologies for any delay on my end. Will try to send a PR by Saturday. There are no current blockers for this task

YahyaHaq avatar Aug 08 '24 13:08 YahyaHaq

Hey @JyotinderSingh apologies for any delay on my end. Will try to send a PR by Saturday. There are no current blockers for this task

Hi @YahyaHaq, wondering if you'll be able to close this anytime soon?

JyotinderSingh avatar Aug 16 '24 04:08 JyotinderSingh

yes. all thats left is the benchmark and the unit tests. Functionality and integration tests are done. You can take a look at my draft PR

YahyaHaq avatar Aug 16 '24 05:08 YahyaHaq

yes. all thats left is the benchmark and the unit tests. Functionality and integration tests are done. You can take a look at my draft PR

The changes look good overall, you can clean up the code and add the remaining tests.

JyotinderSingh avatar Aug 16 '24 06:08 JyotinderSingh

Thanks, @YahyaHaq for contributing. Closing this issue.

Status - shipped 🚀

AshwinKul28 avatar Aug 25 '24 18:08 AshwinKul28