anthos-service-mesh-packages
anthos-service-mesh-packages copied to clipboard
Installation script does not validate whether https://meshconfig/.../initialize was called
When running the script without --enable-all
, the installation succeeds if roles, apis, etc look good. However, the service mesh is not installed properly as https://meshconfig/.../${PROJECT_ID}:initialize
is not called.
The error logged in the istio pilot container is:
2021-04-21T20:11:00.563187Z info Failed to export to Stackdriver: rpc error: code = Unauthenticated desc = transport: failed to exchange access token: access token response does not have access token. {
"error": {
"code": 404,
"message": "Requested entity was not found.",
"status": "NOT_FOUND"
}
}
This is a pretty unfortunate edge case. For all of the other ASM dependencies there are pairs of "enable this" or "validate that it's enabled, fail if not." For this one specific call, there's no way (yet, afaik) to check up front whether it's been initialized or not. I debated printing a warning every time someone ran install_asm
without --enable-gcp-components
but it felt like that would get annoying real quick.
If you think a warning would be better I'm glad to add it, or any other ideas to make the UX smoother.
Yeah — I hear you! I think a warning plus an update to the Anthos SM documentation would be awesome!
Right now it seems like one could expect the installation to succeed without adding any of the enable
flags (assuming all the roles, apis, etc are enabled and in-place), but that doesn't seem to be the case because of this issue.
Is there a simple way to make this call without using the script? We are intentionally not using the enable flags, so we can document what services/apis/roles etc are required.
Sure, we can add a warning. I'll pass the documentation request up the chain.
Re: simple way to make the call--there's certainly a way to make the call, I'm not sure if I'd call it simple in every case. If you're using 1.8 or 1.9 and no preview features, then you should be able to just POST to https://meshconfig.googleapis.com/v1alpha1/projects/${PROJECT_ID}:initialize
with an empty payload and specify the Authorization: Bearer ${TOKEN}
header where TOKEN is the output of gcloud auth print-access-token
.
The reason that particular call is alpha even though the entire API isn't is because soon(tm) it won't be necessary, and just enabling the meshconfig API will be enough to kick off all of the machinery necessary. I guess concretely: when you do document it, add a comment to the effect of "this might be unnecessary later so let's not take any hard dependencies on it."
If you just want to separate out dependencies for different roles, you can use the flag combination of --only-enable --enable-gcp-components
to setup meshconfig, Workload Identity, and Stackdriver without doing anything else. It doesn't help your documentation much, but it might help if the people installing ASM aren't the same ones with permissions to make project level changes.
If i were to be using preview features (in this case i've tried --option hub-meshca
although using environ workload identity isn't a dealbreaker for me, so i'm tempted to just turn it off, reading the documentation it sounded like it was the future of workload identity though) do i need to do something different with the initialise? (I've just tried this and i now get a slightly different error in the gateway pod, with a permission denied)
2021-04-23T16:47:42.101021Z error token access token response does not have access token{
"error": {
"code": 403,
"message": "The caller does not have permission",
"status": "PERMISSION_DENIED"
}
}
2021-04-23T16:47:42.101048Z warn stsserver token manager fails to generate token: access token response does not have access token. {
"error": {
"code": 403,
E0423 16:47:42.101290625 34 oauth2_credentials.cc:157] Call to http server ended with error 500 [{
"error": "invalid_target",
"error_description": "access token response does not have access token. {\n \"error\": {\n \"code\": 403,\n \"message\": \"The caller does not have permission\",\n \"status\": \"PERMISSION_DENIED\"\n }\n}\n",
"error_uri": ""
}].
CreateTimeSeries request failed (1 RPCs, 16 views, 4 timeseries): UNAVAILABLE: Error occurred when fetching oauth2 token.
"message": "The caller does not have permission",
"status": "PERMISSION_DENIED"
}
}
I'm guessing this is because im trying to use a preview feature here?
I might have solved my own problem digging through the script, i think i need to use this for the post data rather than an empty block:
local POST_DATA; POST_DATA='{"workloadIdentityPools":["'${ENVIRON_PROJECT_ID}'.hub.id.goog","'${ENVIRON_PROJECT_ID}'.svc.id.goog"]}'```
Yep! That should be simplified soon as well, those two pools should both end up under the svc.id.goog
name and whether it's Environ or not will get inferred depending on the context. (e.g. if the cluster is registered then the WI is Environ-wide, etc.)
Standard caveat that it's preview, nothing is concrete until it's released, etc. etc.
Ok great, thanks for the help :)