cuckoofilter icon indicating copy to clipboard operation
cuckoofilter copied to clipboard

Would it be better to support concurrent insert?

Open seaguest opened this issue 5 years ago • 5 comments

Hello,

If we have huge data to process, we may need a lot insert, but I didn't see any mutex in code, would it be better to support concurrent insert, or is there any other reason for not?

seaguest avatar Nov 02 '18 09:11 seaguest

I'd have to set a mutex into the insertion call, it might as well be done by the caller... WDYT?

seiflotfy avatar Nov 04 '18 14:11 seiflotfy

I think it would be better to integrate the mutex inside the filter to make it a complete module, with which people can use without dealing with concurrent write problem. Anyway, it's just a matter of choice.

seaguest avatar Nov 17 '18 14:11 seaguest

It'd be interesting to see what the performance tradeoff is; some applications may not need thread safety, and a mutex might slow them down? I dunno.

Anyway, here's my solution; it's very simple and elegant:

type concurrentCuckoo struct {
	*sync.Mutex
	*cuckoo.Filter
}

cf := concurrentCuckoo{
	Filter: cuckoo.NewFilter(1000000)
	Mutex:  new(sync.Mutex)
}

cf.Lock()
cf.InsertUnique([]byte("hello!"))
cf.Unlock()

Side-note: It would be nice to reduce typing repetition in this package a bit. I'll submit another issue.

mholt avatar Jan 11 '19 03:01 mholt

FYI, a mutex will be a congestion point on HCC machines (like, 12 cores and up).

I'd leave the choice of guard to the data structure user, and not integrate it.

mark-kubacki avatar Jan 11 '19 14:01 mark-kubacki

Is it sufficient to do mutex only on writes? Are reads safe while insert is potentially shuffling things? Or does this need a read-write lock to allow concurrent reads when not doing writes?

(if leaving locking to callers, this is a point that's good to document)

cben avatar Dec 01 '20 17:12 cben