firestore: Transaction "context canceled" error leads to write in firestore
Live firestore client version: v1.6.1 Tested locally on: v1.6.1 > v1.13.0
We noticed weird behavior in our live environment. Writes and updates that were in firestore that wasn't supposed to be there since they were handled in a transaction which errored out. I threw together a small test locally using the firestore emulator and can reproduce the error, might need some tinkering with timings to get it to work on other machines however.
package main
func main() {
// ... client setup
wg := sync.WaitGroup{}
nCalls := 10
startChan := make(chan struct{})
now := time.Now()
docID := fmt.Sprintf("testdoc_%d", time.Now().Unix())
for i := 0; i < nCalls; i++ {
wg.Add(1)
go func(i int) {
<-startChan
err := client.RunTransaction(ctx, func(ctx context.Context, tx *firestore.Transaction) error {
docRef := client.Collection("testbubbel").Doc(docID)
_, err := tx.Get(docRef)
if err != nil {
if status.Code(err) != codes.NotFound {
return err
}
}
// fmt.Printf("worker: %d: data: %s\n", i, doc.Data())
if err := tx.Set(docRef, map[string]string{"foo": fmt.Sprintf("bar_%d", i)}); err != nil {
return err
}
// fmt.Printf("returning %d\n", i)
return nil
})
fmt.Printf("err from worker %d: %+v\n", i, err)
wg.Done()
}(i)
}
close(startChan)
time.Sleep(2300 * time.Millisecond)
cancel()
wg.Wait()
fmt.Printf("time: %s\n", time.Since(now))
}
Output:
err from worker 1: rpc error: code = Canceled desc = context canceled
err from worker 2: rpc error: code = Canceled desc = context canceled
err from worker 6: rpc error: code = Canceled desc = context canceled
err from worker 8: rpc error: code = Canceled desc = context canceled
err from worker 0: rpc error: code = Canceled desc = context canceled
err from worker 7: rpc error: code = Canceled desc = context canceled
err from worker 9: rpc error: code = Canceled desc = context canceled
err from worker 5: rpc error: code = Canceled desc = context canceled
err from worker 3: rpc error: code = Canceled desc = context canceled
err from worker 4: rpc error: code = Canceled desc = context canceled
time: 2.301270541s
Expected behavior
- Since all transactions resulted in an error, I expect there to be no update
- The call that leads to context canceled error should return no error if it has reached a point of no return
Actual behavior
Some transactions are pushed through even if there is an error (intermittently).
Additional context
I am aware that the emulator doesn't behave exactly as the live environment but since we saw the issue first in our live env and I could get the behavior to reproduce in the emulator I'm quite convinced this issue lies in the client.
Note that this is an intermittent error, not all calls that return a context canceled error leads to a write so we can't trust that a write has happened if that is the error
Update
I tried setting the nCount to 1 and could reproduce it by setting the cancel to 50 milliseconds, so this is not something that needs parallel calls to happen
Triaged & tagged.
This is by design. Transactions are not rollbacked on context cancelled error.
@bhshkh How can intermittent behavior be expected? how am I supposed to know if a dataset was updated or not when the error provides no additional context other than a “context cancelled” error?
the workaround is to create a new context for all transactions since we can’t rely on your library’s handling of cancellations
Reducing the priority s there is a workaround available. Will look into the issue soon