anthos-service-mesh-packages icon indicating copy to clipboard operation
anthos-service-mesh-packages copied to clipboard

Installation script does not validate whether https://meshconfig/.../initialize was called

Open alexbrand opened this issue 3 years ago • 8 comments

When running the script without --enable-all, the installation succeeds if roles, apis, etc look good. However, the service mesh is not installed properly as https://meshconfig/.../${PROJECT_ID}:initialize is not called.

The error logged in the istio pilot container is:

2021-04-21T20:11:00.563187Z	info	Failed to export to Stackdriver: rpc error: code = Unauthenticated desc = transport: failed to exchange access token: access token response does not have access token. {
  "error": {
    "code": 404,
    "message": "Requested entity was not found.",
    "status": "NOT_FOUND"
  }
}

alexbrand avatar Apr 22 '21 00:04 alexbrand

This is a pretty unfortunate edge case. For all of the other ASM dependencies there are pairs of "enable this" or "validate that it's enabled, fail if not." For this one specific call, there's no way (yet, afaik) to check up front whether it's been initialized or not. I debated printing a warning every time someone ran install_asm without --enable-gcp-components but it felt like that would get annoying real quick.

If you think a warning would be better I'm glad to add it, or any other ideas to make the UX smoother.

zerobfd avatar Apr 22 '21 22:04 zerobfd

Yeah — I hear you! I think a warning plus an update to the Anthos SM documentation would be awesome!

Right now it seems like one could expect the installation to succeed without adding any of the enable flags (assuming all the roles, apis, etc are enabled and in-place), but that doesn't seem to be the case because of this issue.

alexbrand avatar Apr 23 '21 14:04 alexbrand

Is there a simple way to make this call without using the script? We are intentionally not using the enable flags, so we can document what services/apis/roles etc are required.

NickAJScott avatar Apr 23 '21 14:04 NickAJScott

Sure, we can add a warning. I'll pass the documentation request up the chain.

Re: simple way to make the call--there's certainly a way to make the call, I'm not sure if I'd call it simple in every case. If you're using 1.8 or 1.9 and no preview features, then you should be able to just POST to https://meshconfig.googleapis.com/v1alpha1/projects/${PROJECT_ID}:initialize with an empty payload and specify the Authorization: Bearer ${TOKEN} header where TOKEN is the output of gcloud auth print-access-token.

The reason that particular call is alpha even though the entire API isn't is because soon(tm) it won't be necessary, and just enabling the meshconfig API will be enough to kick off all of the machinery necessary. I guess concretely: when you do document it, add a comment to the effect of "this might be unnecessary later so let's not take any hard dependencies on it."

If you just want to separate out dependencies for different roles, you can use the flag combination of --only-enable --enable-gcp-components to setup meshconfig, Workload Identity, and Stackdriver without doing anything else. It doesn't help your documentation much, but it might help if the people installing ASM aren't the same ones with permissions to make project level changes.

zerobfd avatar Apr 23 '21 16:04 zerobfd

If i were to be using preview features (in this case i've tried --option hub-meshca although using environ workload identity isn't a dealbreaker for me, so i'm tempted to just turn it off, reading the documentation it sounded like it was the future of workload identity though) do i need to do something different with the initialise? (I've just tried this and i now get a slightly different error in the gateway pod, with a permission denied)

2021-04-23T16:47:42.101021Z	error	token	access token response does not have access token{
  "error": {
    "code": 403,
    "message": "The caller does not have permission",
    "status": "PERMISSION_DENIED"
  }
}
2021-04-23T16:47:42.101048Z	warn	stsserver	token manager fails to generate token: access token response does not have access token. {
  "error": {
    "code": 403,
E0423 16:47:42.101290625      34 oauth2_credentials.cc:157]  Call to http server ended with error 500 [{
  "error": "invalid_target",
  "error_description": "access token response does not have access token. {\n  \"error\": {\n    \"code\": 403,\n    \"message\": \"The caller does not have permission\",\n    \"status\": \"PERMISSION_DENIED\"\n  }\n}\n",
  "error_uri": ""
}].
CreateTimeSeries request failed (1 RPCs, 16 views, 4 timeseries): UNAVAILABLE: Error occurred when fetching oauth2 token.
    "message": "The caller does not have permission",
    "status": "PERMISSION_DENIED"
  }
}

I'm guessing this is because im trying to use a preview feature here?

NickAJScott avatar Apr 23 '21 16:04 NickAJScott

I might have solved my own problem digging through the script, i think i need to use this for the post data rather than an empty block:

    local POST_DATA; POST_DATA='{"workloadIdentityPools":["'${ENVIRON_PROJECT_ID}'.hub.id.goog","'${ENVIRON_PROJECT_ID}'.svc.id.goog"]}'```

NickAJScott avatar Apr 23 '21 16:04 NickAJScott

Yep! That should be simplified soon as well, those two pools should both end up under the svc.id.goog name and whether it's Environ or not will get inferred depending on the context. (e.g. if the cluster is registered then the WI is Environ-wide, etc.)

Standard caveat that it's preview, nothing is concrete until it's released, etc. etc.

zerobfd avatar Apr 23 '21 17:04 zerobfd

Ok great, thanks for the help :)

NickAJScott avatar Apr 23 '21 18:04 NickAJScott