data-diff icon indicating copy to clipboard operation
data-diff copied to clipboard

Getting "ValueError: range() arg 3 must not be zero" error for multi iteration checks

Open gaurav1308 opened this issue 1 year ago • 7 comments

We are evaluating data-diff for our usecase. We are facing issue when multi step iteration is being performed ie when we are reducing bisection-threshold This is working fine when bisection-threshold is high enough so that everything is done in one iteration.

data-diff trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id -v --json --bisection-factor 9 --bisection-threshold 100000 --max-age=7d -t created_date -c name -c email -c second_factor_auth -c restricted -c parent_id -c fee_model --min-age=1d -s -w "updated_at<1659724200 and created_date<'2022-08-08'"

In second case when we reduced bisection-threshold enough so that all diffs can't be performed in one iteration data-diff trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id -v --json --bisection-factor 9 --bisection-threshold 1000 --max-age=7d -t created_date -c name -c email -c second_factor_auth -c restricted -c parent_id -c fee_model --min-age=1d -s -w "updated_at<1659724200 and created_date<'2022-08-08'"

getting following error

ValueError: range() arg 3 must not be zero

 File "/usr/lib/python3.9/concurrent/futures/_base.py", line 600, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python3.9/concurrent/futures/_base.py", line 433, in result
    return self.__get_result()
  File "/usr/lib/python3.9/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.9/dist-packages/data_diff/diff_tables.py", line 493, in _diff_tables
    yield from self._bisect_and_diff_tables(table1, table2, level=level, max_rows=max(count1, count2))
  File "/usr/local/lib/python3.9/dist-packages/data_diff/diff_tables.py", line 446, in _bisect_and_diff_tables
    checkpoints = table1.choose_checkpoints(self.bisection_factor - 1)
  File "/usr/local/lib/python3.9/dist-packages/data_diff/diff_tables.py", line 180, in choose_checkpoints
    checkpoints = split_space(self.min_key.int, self.max_key.int, count)
  File "/usr/local/lib/python3.9/dist-packages/data_diff/utils.py", line 19, in split_space
    return list(range(start, end, (size + 1) // (count + 1)))[1 : count + 1]```

gaurav1308 avatar Aug 09 '22 15:08 gaurav1308

PS: Using this as we have alphanumeric ids

pip install git+https://github.com/datafold/data-diff.git@alphanum_ids

https://github.com/datafold/data-diff/issues/59#issuecomment-1194403178

gaurav1308 avatar Aug 09 '22 15:08 gaurav1308

Thanks for reporting this. I can't reproduce it, so it would be helpful if you could let me know the values that are being used.

Before the line:

            checkpoints = split_space(self.min_key.int, self.max_key.int, count)

If you could add -

            print("$$$$$", self.min_key, self.max_key, count)

And paste here the results?

erezsh avatar Aug 09 '22 16:08 erezsh

These are the values k id -v --json --bisection-factor 10 --bisection-threshold 1000 --max-age=7d

gaurav1308 avatar Aug 28 '22 19:08 gaurav1308

It seems like I don't have permission on github to push above change Permission to datafold/data-diff.git denied to gaurav1308.

gaurav1308 avatar Aug 28 '22 19:08 gaurav1308

These are the values

That's not what I asked..

Permission to datafold/data-diff.git denied

Yes, of course. Why would you have permissions to push to data-diff? Contributions have to come in the form of pull requests.

erezsh avatar Aug 29 '22 09:08 erezsh

Params and inputs: data-diff trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id --json --bisection-factor 10 --bisection-threshold 1000 --max-age=7d -t created_date -c name -c email -c second_factor_auth -c restricted -c parent_id -c fee_model -v

Attaching log file error.txt

@erezsh Let me know if that helps

gaurav1308 avatar Aug 29 '22 17:08 gaurav1308

@gaurav1308 That's exactly what I need, thank you. Let me look into it and see if I can find the problem.

erezsh avatar Aug 29 '22 18:08 erezsh

We have a new implementation for alphanumerics in master, that I believe should fix this issue.

Sorry it took so long, but please try now and see if it helps.

erezsh avatar Sep 30 '22 10:09 erezsh

Looks like this was fixed

gaurav1308 avatar Nov 03 '22 05:11 gaurav1308