PalDB icon indicating copy to clipboard operation
PalDB copied to clipboard

why paldb is not thread safe,it's read only

Open wanggaohang opened this issue 9 years ago • 7 comments

wanggaohang avatar Jan 06 '16 06:01 wanggaohang

maybe these code in StorageReader.java

indexBuffer.position(indexOffset + slot * slotSize); indexBuffer.get(slotBuffer);

wanggaohang avatar Jan 06 '16 07:01 wanggaohang

Hi,

As is it said "PalDB is not thread-safe at the moment so synchronization should be done externally if multi-threaded."

So don´t worry about what is going on internally, just add a synchronized block in your code.

jscari avatar Jan 06 '16 09:01 jscari

Simple reason, lack of time to implement it (well) in 1.0. Hard to judge what would be the performance impact but I would certainly make it an option one can disable. I also thought it wouldn't be a large barrier as simple client-based synchronization can be done quickly.

mbastian avatar Jan 06 '16 09:01 mbastian

Hi, as a read-only store, most users expect the throughput could be almost linearly scalable in multi-thread environment. I am a little surprised when finally find out PalDB needs external synchronization.

I haven't deep dive into the code base, but it should be trivial to replace the two lines which @wanggaohang mentioned before with one atomic bytebuffer read.

maybe the Cache would be a blocker issue for lockless read, but I think we can simply disable it (or replace it with a guava cache which is thread-safe).

thekiki avatar Jan 24 '17 05:01 thekiki

I made it thread safe here and opened a pull request

xiaoxxcool avatar Apr 22 '17 15:04 xiaoxxcool

I'm interested in this issue as well. Any updates regarding the multi-thread support? Anyone can explain why a read-only data store is not thread-safe by default?

Thank you in advance for your answers.

Alessandro

aleSuglia avatar Jun 15 '17 11:06 aleSuglia

@aleSuglia The reason this isn't simple is basically because the MappedByteBuffer structures the underlying reader uses as "windows" into the mapped file are stateful as @wanggaohang mentioned. The only way a ByteBuffer can be used by multiple threads is not relying on it's position at all by using getXXX(pos). But StorageReader doesn't have a fixed key length, so it needs to pull a dynamic number of bytes out of the buffer, making this approach unfeasible.

The other option is making copies of the ByteBuffer, either local (as @xiaoxxcool has done for PR #33) or thread local.

There might be other areas where threads might overwrite each other's internal data (I've seen the hashing impl being mentioned), but the reader access is the most essential one.

wundrian avatar Jun 16 '17 17:06 wundrian