bbolt icon indicating copy to clipboard operation
bbolt copied to clipboard

Additional documentation/examples related to transaction deadlock

Open sparr opened this issue 2 years ago • 5 comments

https://github.com/etcd-io/bbolt#transactions provides some warnings about potential deadlock scenarios. However, this documentation uses a lot of uncertain language like "should", "generally", "can", etc.

It would be useful to have more certainty around these scenarios. More explicit narrative documentation could help, but my preference would be concrete examples.

e.g. at least "These example snippets will always cause a deadlock" and "... will never cause a deadlock", possibly also "... will cause deadlock if X additional condition is [not] met", with enough examples of each to cover a variety of cases of multiple transactions, threads, etc.

sparr avatar Jun 19 '23 17:06 sparr

Thanks for the reopen. I actually lost my job over not being able to explain when this would or would not deadlock, so seeing more docs here has some personal value for me.

sparr avatar Apr 10 '25 15:04 sparr

Hey @sparr , thanks for raising the issue. I have been reading through the README and I came up with one scenario for the deadlock:

  • Let's say we have 2 goroutines that are dependent on each other. What I mean by that is one of them needs data that is written by another goroutine. In terms of code:
func InduceDeadlockExample() {
    db, err := bbolt.Open(tempfile(), 0600, nil)
    if err != nil {
        log.Fatal(err)
    }
    defer os.Remove(db.Path())

    // Channel to signal that the first goroutine is inside the transaction.
    inTxCh := make(chan struct{})
    // Channel for the second goroutine to send data to the first.
    dataCh := make(chan []byte)

    var wg sync.WaitGroup
    wg.Add(2)

    // Goroutine 1: Start a write transaction and wait for data.
    go func() {
        defer wg.Done()
        err := db.Update(func(tx *bbolt.Tx) error {
            // Signal that we are inside the transaction.
            close(inTxCh)

            // Wait for data from the other goroutine.
            // This will block forever because the other goroutine is waiting
            // for this transaction to finish.
            data := <-dataCh

            b, err := tx.CreateBucketIfNotExists([]byte("bucket"))
            if err != nil {
                return err
            }
            return b.Put([]byte("key"), data)
        })
        if err != nil {
            // This part will not be reached in a deadlock scenario.
        }
    }()

    // Goroutine 2: Wait for Goroutine 1 to be in its transaction, then start another one.
    go func() {
        defer wg.Done()

        // Wait until the first goroutine has the write lock.
        <-inTxCh

        // Now, try to start another write transaction. This will block
        // because Goroutine 1 already holds the write lock.
        err := db.Update(func(tx *bbolt.Tx) error {
            // This code is never reached.
            dataCh <- []byte("value")
            return nil
        })
        if err != nil {
            // This part will not be reached in a deadlock scenario.
        }
    }()
}
  • In the above code, the first goroutine will acquire a write lock.
  • The second goroutine is supposed to produce the data but to do so it requires to start its own transaction which can't be done until the first goroutine releases the lock.
  • The result would be: A deadlock. First goroutine has the lock and needs data from the second. Second goroutine has the data and needs the lock from first.

Let me know if this example sounds good. I can actually take this issue up to think about and write more examples for the README.

cc: @cenkalti @ahrtr

greenblade29 avatar Aug 16 '25 16:08 greenblade29

Another example (with a single goroutine as stated in the README) that I can think of is:

db.Update(func(tx *bbolt.Tx) error {
    // ... do some work ...

    // A nested read transaction
    return db.View(func(txRead *bbolt.Tx) error { 
        // This inner transaction can never start because the outer
        // Update transaction holds the write lock, and no other
        // transaction (even a read-only one) can start.
        return nil
    })
})

If you look closely at the implementation: https://github.com/etcd-io/bbolt/blob/85dcf434b106bb7f881799f7a8cd02826d2bd219/db.go#L147

The RWMutex can have a read lock and a write lock (Ref). So in the above example, when the goroutine starts it will acquire a write lock. When the inner transaction tries to acquire a read lock, it won't be able to do so until the parent transaction is finished. Thus, this will lead to a deadlock scenario.

greenblade29 avatar Aug 17 '25 07:08 greenblade29

Both of your examples have a read/write transaction as the first or outer transaction that prevents other transactions. This is the normal and expected type of deadlock. My concern here is about the documentations warning that read only transactions can cause deadlocks.

Can you provide an example where the first or outer transaction is a read-only transaction? And possibly an example where both transactions are read only?

sparr avatar Aug 17 '25 12:08 sparr

You're right. One case (with first transaction being read-only) can be:

// Start a long-running read-only transaction
go func() {
    err := db.View(func(tx *bbolt.Tx) error {
        log.Println("Read transaction started")
        time.Sleep(10 * time.Second) // Simulate a long-running read
        log.Println("Read transaction completed")
        return nil
    })
    if err != nil {
        log.Println("Read error:", err)
    }
}()

// Try to perform a write operation
err = db.Update(func(tx *bbolt.Tx) error {
    log.Println("Write transaction started")
    _, err := tx.CreateBucketIfNotExists([]byte("mybucket"))
    return err
})

This isn't a deadlock as such, but rather a blocking scenario. While the long-running read is active, write transaction will be blocked to acquire the mutex.

For a nested read-only situation, I can't think of a scenario where it can possibly deadlock. I will wait for someone with more context to provide an example here.

greenblade29 avatar Aug 17 '25 13:08 greenblade29