envoy
envoy copied to clipboard
[delta-XDS] delta-XDS state tracking is broken when there is an exception thrown when watchmap loading the config.
Title: state tracking is broken when there is a partial accept
Description: if there is a partial accept in Delta-XDS, i.e. route-table #50's loading throws an exception, out of 100 routeConfigurations in the batch, the RDS state is updated with the other 50 route tables not yet updated/inserted.
Even reconnection won't fix this issue, as the client believes it has the 50 not loaded route tables with some version already.
{
const auto scoped_update = ttl_.scopedTtlUpdate();
if (requested_resource_state_.contains(Wildcard)) {
for (const auto& resource : message.resources()) {
addResourceStateFromServer(resource);
}
} else {
// We are not subscribed to wildcard, so we only take resources that we explicitly requested
// and ignore the others.
for (const auto& resource : message.resources()) {
if (requested_resource_state_.contains(resource.name())) {
addResourceStateFromServer(resource);
}
}
}
}
watch_map_.onConfigUpdate(non_heartbeat_resources, message.removed_resources(),
message.system_version_info());
Repro steps: See description.
/cc @yanavlasov Please add a no-stale-bot to this issue.
One way to resolve this would be tracking state of resources using the actual per-resource subscriptions, with the state dictionary taking resource_name->reference to the source of truth. Requires quite some plumbing tho.
cc @adisuissa
Envoy should probably delay the update of the tracked resource version (for non-wildcard resource and for wildcard resource) until after the call to the watch_map_.onConfigUpdate() method.
Envoy should probably delay the update of the tracked resource version (for non-wildcard resource and for wildcard resource) until after the call to the
watch_map_.onConfigUpdate()method.
Note that this impacts a reconnect scenario, as Envoy will send an initial_resource_version with resources and versions that may have been NACK'ed.