go-control-plane icon indicating copy to clipboard operation
go-control-plane copied to clipboard

Resource is constantly re-sent if NACKed

Open easwars opened this issue 1 year ago • 1 comments

In gRPC-Go, we use the xDS management server from go-control-plane for a lot of our tests.

The way we set things up is as follows:

  • We create a snapshot cache here: https://github.com/grpc/grpc-go/blob/c9caa9ed53d77155648ba8bbf2ae24b6f4fe19db/internal/testutils/xds/e2e/server.go#L139C2-L139C7
  • We create the server here (and pass it the above cache): https://github.com/grpc/grpc-go/blob/c9caa9ed53d77155648ba8bbf2ae24b6f4fe19db/internal/testutils/xds/e2e/server.go#L163
  • We start serving here: https://github.com/grpc/grpc-go/blob/c9caa9ed53d77155648ba8bbf2ae24b6f4fe19db/internal/testutils/xds/e2e/server.go#L185

From our tests, when we have to configure resources on the management server:

  • We create a new snapshot with the resources here: https://github.com/grpc/grpc-go/blob/c9caa9ed53d77155648ba8bbf2ae24b6f4fe19db/internal/testutils/xds/e2e/server.go#L219
  • We set the snapshot on the server here: https://github.com/grpc/grpc-go/blob/c9caa9ed53d77155648ba8bbf2ae24b6f4fe19db/internal/testutils/xds/e2e/server.go#L232

We have noticed that when the xDS client on our side NACKs a resource from the management server, it resends it immediately, which is NACKed again by the xDS client, and this loop continues forever.

Is there something wrong with the way we use the xDS management server?

easwars avatar Jul 01 '24 21:07 easwars

From an xDS transport protocol perspective, I don't think it's useful for the xDS server to re-send a resource when the client NACKs it. If the client NACKs it, that means that the resource is inherently invalid, and re-sending it without modifying its contents is not going to change that -- the client will just NACK again.

I think it's useful for the xDS server to have a mechanism for reporting when clients NACK, since that is useful information. But it should not affect the actual logic of the xDS server code otherwise. The server should not treat a NACK any differently than an ACK: in either case, it should not re-send until it gets new contents for the resource.

markdroth avatar Jul 01 '24 22:07 markdroth

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Aug 01 '24 00:08 github-actions[bot]

Can one of the owners/maintainers please take a look? Thanks.

easwars avatar Aug 01 '24 18:08 easwars

@markdroth While I agree, this is how the managed control planes work, too (e.g. Istio, TD). It used to handle a legitimate "eventual consistency" use case where parts of config are pushed outside xDS (e.g. some cert file or Wasm module). There's no true rollback. What can help is putting a rate limiter, which you can configure on the client side in Envoy "ads" configuration.

kyessenov avatar Aug 01 '24 20:08 kyessenov

Sorry for the delayed reply. As mentioned by @kyessenov, we have use-cases when NACK do succeed on retry (the main one we've seen is listeners failing to bind at first then succeeding on retry). While those use-cases are for sure not great, this change of behavior can be impactful for users. I'm not against changing to another model, but we'd have to be careful on the user impact

valerian-roche avatar Aug 01 '24 20:08 valerian-roche

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Sep 01 '24 00:09 github-actions[bot]

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

github-actions[bot] avatar Sep 08 '24 04:09 github-actions[bot]