transport: restore default timeout behavior for MDS universe_domain
Revert #2393 to restore the previous behavior, per spec, of usages of Credentials.GetUniverseDomain in transport package.
This should be done once the underlying MDS issue causing https://github.com/googleapis/google-cloud-go/issues/9350 and similar is resolved.
I am also running the google cloud client libraries on GKE. Because of the requests to /computeMetadata/v1/universe/universe_domain, which do not appear supported by GKE metadata server (see b/325999688), the endpoint is 404ing. This is producing nuisance errors in Cloud Logging.
I don't think there is any functional impact to the app. (We do not even appear to be waiting for the 1s timeout implemented as a mitigation for https://github.com/googleapis/google-cloud-go/issues/9350, because the library correctly treats 404 as a non-retriable error.)
@patrickmeiring I think this issue is the wrong place to report that the 404 responses from MDS are a nuisance in Cloud Logging. As you noted, the timeout behavior is (or at least should be) unrelated to the handling of 404 errors.
To determine where to report this issue, can you confirm whether MDS is available at all in your GKE environment? Or is some other form of auth used? I know this sounds obvious, but if MDS is indeed present, then I think the source of the 404 nuisance errors is the lack of support for the universe domain endpoint in MDS in your particular GKE environment.
MDS = GCE Metadata Server, correct?
I believe in our environment, GKE metadata server (part of GKE Workload Identity feature [1]) is receiving and handling these requests, not the GCE Metadata server. Or not directly, anyway.
I think this issue is the wrong place to report that the 404 responses from MDS are a nuisance in Cloud Logging.
You're right. I actually filed the Buganizer ticket (above: b/325999688) for the 404s for GKE Metadata Server, mostly sharing it here to ensure you had full visibility about how this library is behaving on GKE.
Original issue reporter was on GKE too, but they were getting different behaviour -- possibly because they were not using Workload Identity or other reasons?
[1] https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#using_from_your_code
@quartzmo I think we can close this now with the most recent auth changes?
Yes, I'll close this now.