grpc-java
grpc-java copied to clipboard
xds: Improve error message for XdsServer listener not found
Running on master and using TD, if I get the endpoint-selector wrong (or it doesn't exist), I get:
Nov 02, 2021 12:30:50 AM io.grpc.xds.XdsServerBuilder$DefaultListener onNotServing WARNING: [port:50051] NOT_FOUND: Requested entity was not found.
That's not helpful in the least. From a grpc-developer standpoint I can make sense of that after some assumptions, but for a user it is pretty useless except for Googling and hoping there's something on SO.
I do think this is a difficult message though, because the client doesn't actually know what's going on. Maybe there is a port mismatch. Maybe there is a metadata mismatch. Maybe they didn't include metadata in the bootstrap. Maybe the endpoint policy was matching SIDECAR_PROXY instead of GRPC_SERVER. Or maybe it just hasn't propagated through the control plane yet. So there's little hope of including everything the user needs to look at to resolve this.
So I think the best thing to do is just include the context of the error. "Failed to find listening configuration for address '0.0.0.0:50051': Requested entity was not found." Or similarly "Failed to find listening configuration 'grpc/server?xds.resource.listening_address=0.0.0.0:50051': Requested entity was not found." (of which I'm preferring the latter) We might also suggest "check your bootstrap" but... dunno.
CC @sanjaypujare @YifeiZhuang
Yes, makes sense. I too prefer the second one "Failed to find listening configuration 'grpc/server?xds.resource.listening_address=0.0.0.0:50051': Requested entity was not found."
But when I last tested with TD, even if there is no endpoint-policy (aka endpoing-config-selector) or if there is a port mismatch (or some such thing) TD sends a default listener and that was the intended behavior. So I am bit surprised you ran into this - unless this is happening after we implemented the additional validations on the received response. Regardless the error message should be fixed. Will coordinate with @YifeiZhuang to fix this.
@ejona86 good find. Maybe debug log messages gave better clue? To me it would be helpful if it says XdsClient Failed to find listening configuration .... from xDS Control Plane.
This reveals problem: We don't know TD's error message but we simply pass that to our user, that's it. Beyond XdsServerWrapper, this affects any other xds resources.
Also seen:
[port:8080] UNAVAILABLE: Credentials failed to obtain metadata
This one isn't that bad, since auth failures should impact all the resources equally. Although it does appear that we didn't log the cause which was attached... Hmm...
... This one isn't that bad, since auth failures should impact all the resources equally. Although it does appear that we didn't log the cause which was attached... Hmm...
Can you attach the stack trace?
Can you attach the stack trace?
We didn't log any stacktrace there. I just know there was one attached because I found the one place that string is used, and it always attaches an exception.
I happened to stumble across the same error logged from a different spot, while I was doing something else (this is a DEBUG log):
Transient error from XdsClient: Status{code=UNAVAILABLE, description=Credentials failed to obtain metadata, cause=java.io.IOException: Unexpected Error code 403 trying to get security access token from Compute Engine metadata for the default service account: Unable to generate access token; IAM returned 403 Forbidden: The caller does not have permission
This error could be caused by a missing IAM policy binding on the target IAM service account.
For more information, refer to the Workload Identity documentation:
https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#authenticating_to
at com.google.auth.oauth2.ComputeEngineCredentials.refreshAccessToken(ComputeEngineCredentials.java:142)
at com.google.auth.oauth2.OAuth2Credentials.refresh(OAuth2Credentials.java:160)
at com.google.auth.oauth2.OAuth2Credentials.getRequestMetadata(OAuth2Credentials.java:147)
at com.google.auth.Credentials.blockingGetToCallback(Credentials.java:112)
at com.google.auth.Credentials$1.run(Credentials.java:98)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
}
We didn't log any stacktrace there. I just know there was one attached because I found the one place that string is used, and it always attaches an exception.
yes, from com.google.auth.RequestMetadataCallback.onFailure(). I noticed that too.
I happened to stumble across the same error logged from a different spot, while I was doing something else (this is a DEBUG log):
That stack trace is from the Google Auth library. I was hoping to see the stack trace showing the grpc-xds code call stack (mainly ClientXdsClient code).