google-cloud-cpp icon indicating copy to clipboard operation
google-cloud-cpp copied to clipboard

Consider using retry policy for OK streams with failing mutations in `bigtable::BulkApply()`

Open dbolduc opened this issue 4 years ago • 1 comments

The current functionality is described here: https://github.com/googleapis/google-cloud-cpp/blob/b2efc81dca112d7bf9358e6db0e859da9d6195fc/google/cloud/bigtable/table.h#L418-L422

But I think that a LimitedErrorCountRetryPolicy of 2 should mean we only ever send 2 RPCs, regardless of whether the errors are from the overall stream or from individual mutations. For example, this test currently passes, but I do not think it should.

There is some ambiguity in the case where a mutation fails and the stream fails with different status codes. I think in this case, we should continue to use the stream's status code for the retry policy. There is also ambiguity if two mutations fail with different transient errors, but the stream succeeds. Which error is passed to the retry policy?

dbolduc avatar Oct 16 '21 14:10 dbolduc

Bigtable team says this is a bug. We should backoff in between stream attempts, and count OK streams with failing mutations towards the retry policy.

dbolduc avatar Aug 10 '22 20:08 dbolduc