lo icon indicating copy to clipboard operation
lo copied to clipboard

Proposal: Parallel Uniq and Find

Open steve-hb opened this issue 1 year ago • 0 comments

While working with millions (up to billions) of structs in a big slice, I found myself in the position of wanting to remove duplicates (due to database limitations related to transactions and duplicate updates/inserts). This would, at given speeds, take hours to process. Before building more complex structures using hashes etc., I'd prefer to just run my task in parallel - given I don't care about order, this isn't that big of a task.

Therefore my proposal is to implement this: func UniqByParallel[T comparable](slice []T, numThreads int, comparator func(item T, other T) bool) []T

I would also propose my (very simplistic and not optimised) version of this, but first I would like to know what others think and what problems they might see that I don't.

PS: I searched for other libraries and solutions, but didn't find easy alternatives - maybe someone knows a thing :)

steve-hb avatar Nov 27 '23 21:11 steve-hb