PalDB
PalDB copied to clipboard
why paldb is not thread safe,it's read only
maybe these code in StorageReader.java
indexBuffer.position(indexOffset + slot * slotSize); indexBuffer.get(slotBuffer);
Hi,
As is it said "PalDB is not thread-safe at the moment so synchronization should be done externally if multi-threaded."
So don´t worry about what is going on internally, just add a synchronized block in your code.
Simple reason, lack of time to implement it (well) in 1.0. Hard to judge what would be the performance impact but I would certainly make it an option one can disable. I also thought it wouldn't be a large barrier as simple client-based synchronization can be done quickly.
Hi, as a read-only store, most users expect the throughput could be almost linearly scalable in multi-thread environment. I am a little surprised when finally find out PalDB needs external synchronization.
I haven't deep dive into the code base, but it should be trivial to replace the two lines which @wanggaohang mentioned before with one atomic bytebuffer read.
maybe the Cache would be a blocker issue for lockless read, but I think we can simply disable it (or replace it with a guava cache which is thread-safe).
I made it thread safe here and opened a pull request
I'm interested in this issue as well. Any updates regarding the multi-thread support? Anyone can explain why a read-only data store is not thread-safe by default?
Thank you in advance for your answers.
Alessandro
@aleSuglia The reason this isn't simple is basically because the MappedByteBuffer
structures the underlying reader uses as "windows" into the mapped file are stateful as @wanggaohang mentioned. The only way a ByteBuffer
can be used by multiple threads is not relying on it's position at all by using getXXX(pos)
. But StorageReader
doesn't have a fixed key length, so it needs to pull a dynamic number of bytes out of the buffer, making this approach unfeasible.
The other option is making copies of the ByteBuffer
, either local (as @xiaoxxcool has done for PR #33) or thread local.
There might be other areas where threads might overwrite each other's internal data (I've seen the hashing impl being mentioned), but the reader access is the most essential one.