architecture icon indicating copy to clipboard operation
architecture copied to clipboard

Clean up Google Cloud

Open frog32 opened this issue 5 years ago • 18 comments

Context

I am currently working on a Stackdriver Logging integration (or is logging the platform and Google Cloud the integration?) and discovered that there is at least one more integration that uses Google Cloud service account files.

A potential config would look the following (including my future Stackdriver integration):

tts:
  - platform: google_cloud
    key_file: googlecloud.json

google_pubsub:
  project_id: smart-home14
  topic_name: main
  credentials_json: googlecloud.json

google_stackdriver:
  key_file: googlecloud.json
  log_level: info
  project_id: smart-home14
  labels:
    source: home-assistant

Proposal

Create a Google Cloud integration that as config has a service account file which I could also leverage for the Stackdriver integration. The above example would then look the following:

google_cloud:
  key_file: googlecloud.json

tts:
  - platform: google_cloud

google_pubsub:
  project_id: smart-home14
  topic_name: main

google_stackdriver:
  log_level: info
  labels:
    source: home-assistant

Consequences

This would introduce two breaking changes but will allow for more streamlined configuration going forward.

This does not allow for separate service accounts anymore. Separate service accounts only make sense if they either add security or allow better accounting. Once Home Assistant is breached it would be trivial to also extract the second service account and accounting can be done on the API level since Google Cloud bills this separately.

Any help with getting the platform and integration part right is welcome. I for example thought if stackdriver config should be part of logging in case there will be additional logging platforms in the future.

In case it is not clear from my Github profile I am actually working for Google Cloud but my job is not directly related to Home Assistant and I am mostly doing this on my own time.

frog32 avatar Dec 02 '19 22:12 frog32

Would there be any other benefits besides only having to specify the file once in your config ?

balloob avatar Dec 03 '19 00:12 balloob

I think that is unnecessary. As long as they share helper code related to handling the service account I think that would be okey.

google_assistant also use service accounts.

elupus avatar Dec 03 '19 00:12 elupus

No that would pretty much be the only benefit. @elupus the code to handle the service account is provided by the core component in https://github.com/googleapis/google-cloud-python whats left is the different name for the key file that is named credentials_json in pubsub and key_file for the TTS API.

frog32 avatar Dec 03 '19 08:12 frog32

Yeah we should align the config keys

balloob avatar Dec 03 '19 20:12 balloob

Can we use yaml anchors & extend there? That would minimize the DRY violation.

ties avatar Dec 03 '19 21:12 ties

If we are concerned about the repetition then we should fix it by getting rid of the repetition and not by using yaml anchors. Imagine that in a world where most of our config is doable from the frontend. Then you would set up Google Cloud which would ask you for the credential file that you would have to either paste or upload. Then you can essentially with one click enable Stackdriver and if you want to use PubSub you have to specify two more parameters.

However I agree with @balloob that this does not have to happen right now and aligning the config files by changing the PubSub integration to also use key_file instead of credentials_json would solve that. I am proposing to change PubSub because it seems that this gets used less frequently than TTS and I personally don't think the config should have the word json in it since that's an implementation detail.

If we have data on which component gets used more it might help with deciding this.

frog32 avatar Dec 04 '19 11:12 frog32

Note: json is valid yaml. So preferable it's better if they just take a dict value and you can use

key_data: !include key_file.json

You can then get it from secrets store.

key_data: !secret MYKEY

elupus avatar Dec 04 '19 11:12 elupus

Reference on above way to load the json file: https://github.com/home-assistant/home-assistant/blob/992d9273bb68690fadfb2232d47df15f6314956d/homeassistant/components/google_assistant/init.py#L83

elupus avatar Dec 04 '19 11:12 elupus

If we are concerned about the repetition then we should fix it by getting rid of the repetition and not by using yaml anchors. Imagine that in a world where most of our config is doable from the frontend. Then you would set up Google Cloud which would ask you for the credential file that you would have to either paste or upload. Then you can essentially with one click enable Stackdriver and if you want to use PubSub you have to specify two more parameters.

YAML anchors are not a solution but I personally prefer them over the magic loaders in the yaml.

In the end credentials are a specialized type of entity that you would like to refer to and store in a specialized store (not in yaml, ideally in hardware backed storage).

That would also translate to the .storage-storage.

ties avatar Dec 04 '19 19:12 ties

@elupus the library expects a key_file and not the credentials themself, so it does not make a ton of sense to load that into the config. https://github.com/googleapis/google-cloud-python/blob/master/api_core/docs/auth.rst#service-accounts

frog32 avatar Dec 04 '19 21:12 frog32

Credentials.from_service_account_info() exists. Which is what that function use.

elupus avatar Dec 04 '19 21:12 elupus

I know it exists but I see no value in including this into the configuration. It just opens room for errors and if the key_file has a new field in the future we have to adjust validation on Home Assistant side. If the only reason to do that is to use the secrets store instead of a key file that is standard across Google Cloud (I can double check with folks working on this part of Google Cloud tomorrow) we should use it to avoid having to manage the format of this file.

frog32 avatar Dec 04 '19 22:12 frog32

I'd say it reduces risk since we can do volouptous validation on the data and give reasonable errors on config validation rather than handling an exception from Google's lib.

It also allow storage of this data using GUI in the normal config flows without need to store as a json file in the background.

elupus avatar Dec 04 '19 22:12 elupus

I double checked and even at Google internally we recommend using the file instead of manually extracting the key. Those files can in theory contain more information than they now do and the Google provided library if kept up to date is guaranteed to be able to read the file.

Unless you provide me with a library that will be supported going forward I will opt for the official way and cover the case that the file could not be parsed (this would only happen if someone manually changes the file or our version of the library is horribly out of date).

If this gets implemented as config flow I would recommend to also directly use the file (uploading a file that you get from the Cloud Console is less error prone than having to extract the key yourself.

frog32 avatar Dec 05 '19 10:12 frog32

I double checked and even at Google internally we recommend using the file instead of manually extracting the key. Those files can in theory contain more information than they now do and the Google provided library if kept up to date is guaranteed to be able to read the file.

Very quick reply (without checking docs): Does the library provide the ability to validate a key file? That could easily be used in a custom validator.

ties avatar Dec 06 '19 08:12 ties

No the library tries to create credentials and then throws a value error if it does not succeed. The actual loading of the credentials is done by a private function that throws a value error.

I still think we should not try to validate the file and instead at run time throw an error if the file seems broken. If there is a use case where having it in yaml is enabling us to do more than just using the original file I am happy to look into it. However I feel that all this provides is more ways in how users can break their configuration. I know power-users want their configuration to use secrets but the entire content of this file should not be public so if everything that happens is that one has to convert the json file to yaml and then pack it into the secret file to then include it again we are over-complicating things.

frog32 avatar Dec 06 '19 10:12 frog32

Just for reference. No conversion is needed for secrets file. json is perfectly valid yaml. So it's just copy paste.

Anyway it's fully possible to support both cases at some point in the future in a backwords compatible way. So if it's a point of disagreement, let's ignore it for now and get back to the centralization of the information and use the implementors preferred choice.

elupus avatar Dec 06 '19 12:12 elupus

I have created an initial version https://github.com/home-assistant/home-assistant/pull/29590 next step is to allow changing of the log level and propper error handling when we can not load the file.

frog32 avatar Dec 07 '19 12:12 frog32

This architecture issue is old, stale, and possibly obsolete. Things changed a lot over the years. Additionally, we have been moving to discussions for these architectural discussions.

For that reason, I'm going to close this issue.

../Frenck

frenck avatar May 11 '23 13:05 frenck