grpc-go
grpc-go copied to clipboard
"GCE" metadata check is fragile
See also #3666
Google Container Engine, Google Cloud Run, Google App Engine, and possibly also Google Cloud Functions all provide the "GCE" metadata server. However, gRPC checks explicitly for exact string matches of Google
or Google Compute Engine
when running on linux by inspecting the contents of the /sys/class/dmi/id/product_name
file.
The runtime contracts of none of the non-GCE execution platforms state that they will provide either of these values in /sys/class/dmi/id/product_name
. See https://issuetracker.google.com/issues/158533357 for a specific case where it was intentionally changed by product decision, and then causes breakage in gRPC clients.
This issue is not specific to Go. There are equivalent issues in Java and probably every other auth library dealing with application default credentials which on GCP rely on the metadata servers. Given that the official API guidance is to rely on our client libraries, that the client libraries have baked in a narrow check makes it incredibly difficult to make these changes without pinning the product to legacy behavior, even if a PR ends up relaxing the check.
/cc @krishna-seerapu @steren
Would using https://pkg.go.dev/cloud.google.com/go/compute/metadata#OnGCE solve the problem?
@menghanl systemInfoSuggestsGCE
uses the exact same logic. On the other hand if that were used, then a service could explicitly set then environment variable GCE_METADATA_HOST
to the normal value 169.254.169.254
, which would escape out of the testing logic. That said, the comments look like that environment variable is specific to the Go compute client, and might not be generally available across other language clients (or clearly, across all Go clients).
Even if we did change the way that GCE detection is done, there are already a bunch of clients in the wild (e.g. from older gRPC releases) that depend on this.
So if an existing platform changes behavior and breaks it, we would still have problems - @Capstan has that been considered?. I.e. I'm wondering how changing this in new client releases would solve anything.
@apolcyn It would not solve the breaking of extant services. It would however allow services to selectively change their behavior to an improved reporting model of what actual service is running. It remains, AFAIK, an open question about how to roll that out safely.
In order to help client libraries know the runtime they are running on, Google wanted Google Cloud Run, Google App Engine and Google Cloud Functions to return more accurate product names at /sys/class/dmi/id/product_name
. This change is now on hold due to downstream impact on grpc-go
and gax-java
.
My suggestion would be for grpc-go
to relax this check and instead of looking for a hardcoded list of values, to only check that the product_name
string starts with Google
. After some time (months? years?), Google Cloud runtimes might be able to switch (carefully) to more accurate product names.
#5348 changes OnGCE
to check for the prefix. A separate issue should be filed to update pkg.go.dev/cloud.google.com/go/compute/metadata#OnGCE
We can switch to metadata.OnGCE()
if necessary. The concern is that it pulls in a big google cloud dependency to gRPC's main module.
Assigning to @menghanl since you have a PR open for this.
A separate issue should be filed to update pkg.go.dev/cloud.google.com/go/compute/metadata#OnGCE
Filed https://github.com/googleapis/google-cloud-go/issues/6037
I am not sure any changes should be made here without going to an update to aip.dev. This kind of change should be driven from a design that all of the languages we support can implement.
Update: I will bring this up at our auth meeting later this week.
In that case, we will keep #5348 open until a decision is made.
I did bring this up with relevant parties today, still more discussion to be had. I think we should wait for changes to official docs or until we have a plan forward before we update what BIOS values to check.
Based on the updates on https://github.com/googleapis/google-cloud-go/issues/6037, I don't think we should do anything here until there is broader agreement.