etcd icon indicating copy to clipboard operation
etcd copied to clipboard

TXN order of requests inside SUCCESS part matters => BUG

Open socketpair opened this issue 1 year ago • 7 comments

What happened?

etcd answers with unexpected ResponseHeader revision values inside TXN response for DeleteRange part

What did you expect to happen?

I expect all revisions in all three places equal (see attached JSON in reproduction section):

If it is expected behavior, PLEASE describe (document) what all the places mean. https://etcd.io/docs/v3.3/learning/api/ does not say anything about meaning of these parts in TXN context.

How can we reproduce it (as minimally and precisely as possible)?

.sh script:

set -e -u -x

etcdctl del --prefix /xxx/
etcdctl -w json txn <<<'
del --prefix /xxx/r/
put /xxx/a b

' | jq

gives (useless fields removed):

{
  "header": {
    "revision":7,
  },
  "responses": [
    {
      "Response": {
        "response_delete_range": {
          "header": {
            "revision": 6
          }
        }
      }
    },
    {
      "Response": {
        "response_put": {
          "header": {
            "revision": 7
          }
        }
      }
    }
  ]
}

Etcd version (please run commands below)

$ etcd --version
etcd Version: 3.5.0
Git SHA: Not provided (use ./build instead of go build)
Go Version: go1.17
Go OS/Arch: linux/amd64


$ etcdctl version
etcdctl version: 3.5.0
API version: 3.5

socketpair avatar Sep 19 '22 18:09 socketpair

Reordering requests in TXN (i.e. put, and then del) makes all revisions equal. Since order of requests inside a single TXN section has never been documented as important, I consider it as a bug.

Seems, Etcd builds responses step-by-step, and on the first step, DB was not modified yet (because delete request changes nothing in provided example). In the end, Etcd builds its topmost revision, which is one more, because DB is changed by PUT request. If so, I would recommend update all the revisions in the end of TXN, not only the topmost one.

https://etcd.io/docs/v3.3/learning/api/

Revision - the revision of the key-value store when generating the response.

It's not clear regarding what it should return in terms of atomicity in context of TXN.

socketpair avatar Sep 19 '22 18:09 socketpair

Thanks @socketpair raising this discussion. It's expected behavior based on current implementation. Both deletion and range operations may have smaller revision based on their position in the list.

For a TXN, the client application should only check the revision in the top level header instead of each sub response's revision.

Does this have any impact on your application?

ahrtr avatar Sep 21 '22 13:09 ahrtr

Yes, it does. I triggered assertion failure in a library. Please fix documentation about applications, that they should not check corresponding revision in txn responses.

And even better, remove these fields from txn response if no one should consider them.

socketpair avatar Sep 21 '22 15:09 socketpair

We have two choice:

  1. Keep it as it's, and only update the doc to mention that for responses corresponding to range and deletion operations may have a smaller revision based on their position in the transaction operation list;
  2. Fix it to make sure all responses in a TxnResponse have the same revision. Does this have any impact on Kubernetes. cc @liggitt @dims

What's your opinion? cc @ptabor @serathius @spzala

ahrtr avatar Sep 21 '22 22:09 ahrtr

I would suggest changing protocol of response if it does not affect old versions of libraries

socketpair avatar Sep 22 '22 03:09 socketpair

2. Fix it to make sure all responses in a TxnResponse have the same revision. Does this have any impact on Kubernetes. cc @liggitt @dims

if the issue is scoped to DeleteRange requests, then it does not impact Kubernetes (we only delete individual keys)

liggitt avatar Sep 22 '22 13:09 liggitt

I did not check, possibly problem also happens if I change deleterange with getrange query.

socketpair avatar Sep 22 '22 18:09 socketpair

  1. Fix it to make sure all responses in a TxnResponse have the same revision. Does this have any impact on Kubernetes. cc @liggitt @dims

if the issue is scoped to DeleteRange requests, then it does not impact Kubernetes (we only delete individual keys)

Thanks for the feedback. I just searched the K8s repo, and confirmed that the proposed change doesn't impact Kubernetes. The only place that K8s might delete a key inside a TXN is store.go#L308, but it doesn't use the revision in the response at all. K8s also uses the range operations inside a TXN in a couple of places, abut they do not use the revision either. So no any impact on K8s.

ahrtr avatar Sep 23 '22 01:09 ahrtr

And even better, remove these fields from txn response if no one should consider them.

Agreed. For a TXN, it doesn't make sense for each inner response have a separate ResponseHeader. Removing it may have impact on existing applications which depends on etcd. Deprecating it might be a feasible approach, but I'd like keep it as it is for now, because it seems not like a big deal.

Correcting the revision in each inner response may have some performance impact on TXN, because we need to iterate all inner responses again.

Points from my side:

  1. Just as I mentioned previously, For a TXN, the client application should only check the revision in the top level header instead of each inner response's header (e.g. revision).
  2. The revision in each inner response is just the the latest revision at the moment when the inner operation is performed.

We can update the doc if there is no any objections.

ahrtr avatar Sep 25 '22 00:09 ahrtr

FYI. https://github.com/etcd-io/website/pull/615

ahrtr avatar Sep 26 '22 07:09 ahrtr

Resolved in https://github.com/etcd-io/website/pull/615

ahrtr avatar Sep 28 '22 21:09 ahrtr