fuzzywuzzy icon indicating copy to clipboard operation
fuzzywuzzy copied to clipboard

`process.dedupe()` gives IndexError: list index out of range because of bug in `process.extractWithoutOrder()`

Open Thijsvandepoll opened this issue 3 years ago • 0 comments

Hi all,

I found a bug in process.extractWithoutOrder() which causes process.dedupe() to fail unexpectedly. The example:

process.dedupe(["BRITT JEFFREY S", "BRITT JEFFREY S.", "WIEDEMAN SCOTT", "WIEDERMANN SCOTT", "斯科特·维德曼", "杰弗里·S·布里特"])

which results in:

IndexError: list index out of range

The expected result here is:

dict_keys(['BRITT JEFFREY S.', 'WIEDERMANN SCOTT', '斯科特·维德曼', '杰弗里·S·布里特'])

I looked into the source code and I believe I found a bug in process.extractWithoutOrder() which sets the used (pre)processor different for the query then for the choices. I will create a merge request to fix this issue.

Thijsvandepoll avatar Apr 02 '21 09:04 Thijsvandepoll