imagededup icon indicating copy to clipboard operation
imagededup copied to clipboard

Optional parallelization of cosine similarity computation (Issue #95)

Open EduardKononov opened this issue 5 years ago • 7 comments

The PR adds parallel_cosine_similarity to find_duplicates which is set to True by default so that it won't break any existing code. I have a really huge dataset and not enough RAM to use multiprocessing so introducing the ability to disable parallel computation of cosine similarity was the only way to use the package.

EduardKononov avatar Oct 13 '20 16:10 EduardKononov

Oh, I haven't changed the the tests...

EduardKononov avatar Oct 13 '20 16:10 EduardKononov

Related to #95 Also, please fork from the dev branch, not master. Have a look at the contribution guide.

tanujjain avatar Nov 17 '20 12:11 tanujjain

@EduardKononov Do you intend to work on this?

tanujjain avatar Dec 01 '20 14:12 tanujjain

@tanujjain yes, I do, but later because I have no free time at all now. The deadline is mid-January. Hope sooner

EduardKononov avatar Dec 01 '20 14:12 EduardKononov

Unfortunately, I still have no time to do that. I'm here just to notify that I remember but have no opportunity

EduardKononov avatar Jan 15 '21 19:01 EduardKononov

@EduardKononov Thanks for the info. Do you think you'll have time in the following weeks? Otherwise, I may have to start working on it sometime in February(2nd/3rd week).

tanujjain avatar Jan 15 '21 19:01 tanujjain

@tanujjain no, I don't think so

EduardKononov avatar Jan 15 '21 19:01 EduardKononov

Closing since this is being tackled in #185

tanujjain avatar Dec 28 '22 11:12 tanujjain