Additional documentation/examples related to transaction deadlock
https://github.com/etcd-io/bbolt#transactions provides some warnings about potential deadlock scenarios. However, this documentation uses a lot of uncertain language like "should", "generally", "can", etc.
It would be useful to have more certainty around these scenarios. More explicit narrative documentation could help, but my preference would be concrete examples.
e.g. at least "These example snippets will always cause a deadlock" and "... will never cause a deadlock", possibly also "... will cause deadlock if X additional condition is [not] met", with enough examples of each to cover a variety of cases of multiple transactions, threads, etc.
Thanks for the reopen. I actually lost my job over not being able to explain when this would or would not deadlock, so seeing more docs here has some personal value for me.
Hey @sparr , thanks for raising the issue. I have been reading through the README and I came up with one scenario for the deadlock:
- Let's say we have 2 goroutines that are dependent on each other. What I mean by that is one of them needs data that is written by another goroutine. In terms of code:
func InduceDeadlockExample() {
db, err := bbolt.Open(tempfile(), 0600, nil)
if err != nil {
log.Fatal(err)
}
defer os.Remove(db.Path())
// Channel to signal that the first goroutine is inside the transaction.
inTxCh := make(chan struct{})
// Channel for the second goroutine to send data to the first.
dataCh := make(chan []byte)
var wg sync.WaitGroup
wg.Add(2)
// Goroutine 1: Start a write transaction and wait for data.
go func() {
defer wg.Done()
err := db.Update(func(tx *bbolt.Tx) error {
// Signal that we are inside the transaction.
close(inTxCh)
// Wait for data from the other goroutine.
// This will block forever because the other goroutine is waiting
// for this transaction to finish.
data := <-dataCh
b, err := tx.CreateBucketIfNotExists([]byte("bucket"))
if err != nil {
return err
}
return b.Put([]byte("key"), data)
})
if err != nil {
// This part will not be reached in a deadlock scenario.
}
}()
// Goroutine 2: Wait for Goroutine 1 to be in its transaction, then start another one.
go func() {
defer wg.Done()
// Wait until the first goroutine has the write lock.
<-inTxCh
// Now, try to start another write transaction. This will block
// because Goroutine 1 already holds the write lock.
err := db.Update(func(tx *bbolt.Tx) error {
// This code is never reached.
dataCh <- []byte("value")
return nil
})
if err != nil {
// This part will not be reached in a deadlock scenario.
}
}()
}
- In the above code, the first goroutine will acquire a write lock.
- The second goroutine is supposed to produce the data but to do so it requires to start its own transaction which can't be done until the first goroutine releases the lock.
- The result would be: A deadlock. First goroutine has the lock and needs data from the second. Second goroutine has the data and needs the lock from first.
Let me know if this example sounds good. I can actually take this issue up to think about and write more examples for the README.
cc: @cenkalti @ahrtr
Another example (with a single goroutine as stated in the README) that I can think of is:
db.Update(func(tx *bbolt.Tx) error {
// ... do some work ...
// A nested read transaction
return db.View(func(txRead *bbolt.Tx) error {
// This inner transaction can never start because the outer
// Update transaction holds the write lock, and no other
// transaction (even a read-only one) can start.
return nil
})
})
If you look closely at the implementation: https://github.com/etcd-io/bbolt/blob/85dcf434b106bb7f881799f7a8cd02826d2bd219/db.go#L147
The RWMutex can have a read lock and a write lock (Ref). So in the above example, when the goroutine starts it will acquire a write lock. When the inner transaction tries to acquire a read lock, it won't be able to do so until the parent transaction is finished. Thus, this will lead to a deadlock scenario.
Both of your examples have a read/write transaction as the first or outer transaction that prevents other transactions. This is the normal and expected type of deadlock. My concern here is about the documentations warning that read only transactions can cause deadlocks.
Can you provide an example where the first or outer transaction is a read-only transaction? And possibly an example where both transactions are read only?
You're right. One case (with first transaction being read-only) can be:
// Start a long-running read-only transaction
go func() {
err := db.View(func(tx *bbolt.Tx) error {
log.Println("Read transaction started")
time.Sleep(10 * time.Second) // Simulate a long-running read
log.Println("Read transaction completed")
return nil
})
if err != nil {
log.Println("Read error:", err)
}
}()
// Try to perform a write operation
err = db.Update(func(tx *bbolt.Tx) error {
log.Println("Write transaction started")
_, err := tx.CreateBucketIfNotExists([]byte("mybucket"))
return err
})
This isn't a deadlock as such, but rather a blocking scenario. While the long-running read is active, write transaction will be blocked to acquire the mutex.
For a nested read-only situation, I can't think of a scenario where it can possibly deadlock. I will wait for someone with more context to provide an example here.