bolt icon indicating copy to clipboard operation
bolt copied to clipboard

Attempting a write while doing a read prevents all future reads until the write finishes, even if the write is blocking on another read

Open cep21 opened this issue 9 years ago • 7 comments

Current behavior

If one takes a long read lock, for example when downloading a database via HTTP, this will take a mmap.RLock(). Other reads will be allowed while this happens during this RLock(). If any write operation happens, that write operation will block on mmap.Lock() call and cause all future mmap.RLock() operations to block. This means that a single write will stop all read operations that could otherwise happen until the longest read is finished.

Reason

Unfortunately, the documentation for sync.RWMutex includes the following comment:

// To ensure that the lock eventually becomes available,
// a blocked Lock call excludes new readers from acquiring
// the lock.
func (rw *RWMutex) Lock() {

This means if any write operation tries to take a Lock() and cannot because a read operation is happening in the background, then that write operation will prevent all future read operations from occurring.

Ideal behavior

When doing a long read operation, other read operations should be allowed while writes block. When a write is attempted, it should block on outstanding reads but not prevent future reads.

Possible solution

It may be important to document that any long read operations, like the suggested HTTP download, should disable DB writes while they happen, otherwise the database could get into a locked state.

cep21 avatar Jan 25 '16 22:01 cep21

@cep21

Please check https://github.com/boltdb/bolt/blob/master/db.go#L860-L868.

You can give it a large mmap size. It won't grow on disk size, just use more virtual memory. It wont be a problem in general.

xiang90 avatar Jan 25 '16 22:01 xiang90

What should I do for databases that I expect to be gigabytes in size? Should this be in the billions? Will this work for databases that are already created?

cep21 avatar Jan 25 '16 22:01 cep21

@cep21 Well... I cannot answer this question for you. It is your database, and you are the best person to guess the size. If you do not have an expectation of your database size, this solution will not work well for you. It should work for databases that already created. But you'd better try it out first.

xiang90 avatar Jan 25 '16 23:01 xiang90

I see. I'm trying to figure out how allocate() works. It looks like when allocate() isn't able to find memory, it grows the mmap by only a small amount. Is that true? Would it be more efficient to grow mmap() by some larger size over time?

cep21 avatar Jan 25 '16 23:01 cep21

Nevermind, I spotted mmapSize and noticed the growth algorithm

cep21 avatar Jan 25 '16 23:01 cep21

@cep21 Awesome. Thanks!

xiang90 avatar Jan 25 '16 23:01 xiang90

Worth clarifying:

If any write operation happens, that write operation will

This is only true for growing the database. A typical write would still proceed. Only special unlucky writes will wait.

When doing a long read operation, other read operations should be allowed while writes block. When a write is attempted, it should block on outstanding reads but not prevent future reads.

That would starve the writer (that needs to grow), never making forward progress as long as there are readers active. The way it is programmed, once a resize is needed, further readers will be blocked, until the resize can happen. Same reasoning as for sync.RWMutex.

Now, let's take a step back and look at the actual use case:

If one takes a long read lock, for example when downloading a database via HTTP,

The docs: "If a long running read transaction (for example, a snapshot transaction) is needed, you might want to set DB.InitialMmapSize to a large enough value to avoid potential blocking of write transaction." https://godoc.org/github.com/boltdb/bolt#DB.Begin

You can also decrease the duration of your slow read operation by cloning the database with https://godoc.org/github.com/boltdb/bolt#Tx.WriteTo (or OS level mechanisms for atomic file snapshots), and then you can serve slow operations from the copy.

tv42 avatar Sep 03 '16 00:09 tv42