Log client: WaitForRootUpdate "stuck" with concurrent calls to AddLeaf
If LogClient.AddLeaf is called concurrently (and thus multiple leaves are queued, and then processed together in a single pass of the operation manager), when the LogClient.root is updated, only one of the concurrent calls to WaitForRootUpdate successfully return.
When the other goroutines reach LogClient.UpdateRoot the new trusted root of the client is already updated, therefore they keep waiting in WaitForRootUpdate until either a new leaf is added, or the context expires/is canceled.
This commit reproduces the behavior in a unit test: https://github.com/gpdionisio/trillian/commit/8acb0525a5a389e4818f470980a54e737456beee
This commit is a proposed fix: https://github.com/gpdionisio/trillian/commit/d0f7ad2d3ae7d8685332ae2cf9ae14de91e6060e
I haven't looked into details but this looks similar to https://github.com/google/trillian/pull/3236 at a glance.
I could be wrong, but I think this is a different issue. There is no deadlock here.
As shown in the test, the trusted root is updated by one goroutine while the others sleep in WaitForRootUpdate. When they wake up and call UpdateRoot, they compare the new root received from the server with the trusted one (which was updated while they were sleeping), and, since they are the same, they keep waiting for the next root update.
But callers of AddLeaf shouldn't wait for the next update, because their leaf has already been processed.