RapidFuzz icon indicating copy to clipboard operation
RapidFuzz copied to clipboard

Support multiprocessing for the process module

Open maxbachmann opened this issue 2 years ago • 6 comments

All the algorithms in the process module should be fairly simple to run in parallel.

maxbachmann avatar Sep 16 '21 15:09 maxbachmann

This is now supported for process.cdist using the workers argument.

maxbachmann avatar Sep 27 '21 15:09 maxbachmann

@maxbachmann Is multiprocessing supported in extractOne?

shubhamscifi avatar Apr 13 '22 11:04 shubhamscifi

So far multiprocessing is only supported by process.cdist.

maxbachmann avatar Apr 13 '22 12:04 maxbachmann

So if I want multiprocessing in the 1:n scenario (process.extract), what would you recommend currently? Is using process.cdist with a single-item query going to be better than a custom loop over the scorer (turned parallel via Python multiprocessing or some other paradigm)?

bertsky avatar Oct 26 '22 17:10 bertsky

Multiprocessing in the 1:n scenario is not really implemented in process.cdist. Currently it uses multiprocessing for the outer loop, so it has basically no effect when used on 1:n. So right now you should probably use Python multiprocessing.

maxbachmann avatar Oct 27 '22 19:10 maxbachmann

Understood. Thanks for the clarification!

bertsky avatar Oct 27 '22 19:10 bertsky