data-diff
data-diff copied to clipboard
Getting "ValueError: range() arg 3 must not be zero" error for multi iteration checks
We are evaluating data-diff for our usecase. We are facing issue when multi step iteration is being performed ie when we are reducing bisection-threshold This is working fine when bisection-threshold is high enough so that everything is done in one iteration.
data-diff trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id -v --json --bisection-factor 9 --bisection-threshold 100000 --max-age=7d -t created_date -c name -c email -c second_factor_auth -c restricted -c parent_id -c fee_model --min-age=1d -s -w "updated_at<1659724200 and created_date<'2022-08-08'"
In second case when we reduced bisection-threshold enough so that all diffs can't be performed in one iteration
data-diff trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id -v --json --bisection-factor 9 --bisection-threshold 1000 --max-age=7d -t created_date -c name -c email -c second_factor_auth -c restricted -c parent_id -c fee_model --min-age=1d -s -w "updated_at<1659724200 and created_date<'2022-08-08'"
getting following error
ValueError: range() arg 3 must not be zero
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 600, in result_iterator
yield fs.pop().result()
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 433, in result
return self.__get_result()
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.9/dist-packages/data_diff/diff_tables.py", line 493, in _diff_tables
yield from self._bisect_and_diff_tables(table1, table2, level=level, max_rows=max(count1, count2))
File "/usr/local/lib/python3.9/dist-packages/data_diff/diff_tables.py", line 446, in _bisect_and_diff_tables
checkpoints = table1.choose_checkpoints(self.bisection_factor - 1)
File "/usr/local/lib/python3.9/dist-packages/data_diff/diff_tables.py", line 180, in choose_checkpoints
checkpoints = split_space(self.min_key.int, self.max_key.int, count)
File "/usr/local/lib/python3.9/dist-packages/data_diff/utils.py", line 19, in split_space
return list(range(start, end, (size + 1) // (count + 1)))[1 : count + 1]```
PS: Using this as we have alphanumeric ids
pip install git+https://github.com/datafold/data-diff.git@alphanum_ids
https://github.com/datafold/data-diff/issues/59#issuecomment-1194403178
Thanks for reporting this. I can't reproduce it, so it would be helpful if you could let me know the values that are being used.
Before the line:
checkpoints = split_space(self.min_key.int, self.max_key.int, count)
If you could add -
print("$$$$$", self.min_key, self.max_key, count)
And paste here the results?
These are the values
k id -v --json --bisection-factor 10 --bisection-threshold 1000 --max-age=7d
It seems like I don't have permission on github to push above change
Permission to datafold/data-diff.git denied to gaurav1308.
These are the values
That's not what I asked..
Permission to datafold/data-diff.git denied
Yes, of course. Why would you have permissions to push to data-diff? Contributions have to come in the form of pull requests.
Params and inputs:
data-diff trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://[email protected]@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id --json --bisection-factor 10 --bisection-threshold 1000 --max-age=7d -t created_date -c name -c email -c second_factor_auth -c restricted -c parent_id -c fee_model -v
Attaching log file error.txt
@erezsh Let me know if that helps
@gaurav1308 That's exactly what I need, thank you. Let me look into it and see if I can find the problem.
We have a new implementation for alphanumerics in master
, that I believe should fix this issue.
Sorry it took so long, but please try now and see if it helps.
Looks like this was fixed