data-diff icon indicating copy to clipboard operation
data-diff copied to clipboard

where clause behaving weirdly

Open akulgoel96 opened this issue 1 year ago • 1 comments

Describe the bug So I am currently in the process of setting up data-diff and been facing some weird results from the where param. So my data-diff command is working perfectly fine without the where condition but facing issues while providing this condition.

So, data-diff trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id -v --json --bisection-factor 6 --bisection-threshold 100000000

this works fine, but this doesn't

data-diff trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id -v --json --bisection-factor 6 --bisection-threshold 100000000 -w "created_date = '2022-08-02'"

stack trace attached

In the above I get this error: ValueError: Error: min_key expected to be smaller than max_key!

Here's where it gets interesting though: if in the where condition I provide an earlier date, that works perfectly fine: data-diff` trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id -v --json --bisection-factor 6 --bisection-threshold 100000000 -w "created_date = '2022-08-01'"

Describe the environment

Using the master version of airbyte.

akulgoel96 avatar Aug 07 '22 16:08 akulgoel96

@akulgoel96 Thank you for reporting it. However I cannot reproduce this error, and it's not clear why it happens. Can you please find out the values of min_key / max_key before the exception occurs? That might give us a clue.

Even better if at diff_tables.py, in the diff_tables() function, you could print the values of key_ranges, and then min_key1, max_key1.

Thanks!

erezsh avatar Aug 09 '22 16:08 erezsh

Closed due to inactivity.

erezsh avatar Sep 20 '22 11:09 erezsh