architecture
architecture copied to clipboard
Clean up Google Cloud
Context
I am currently working on a Stackdriver Logging integration (or is logging the platform and Google Cloud the integration?) and discovered that there is at least one more integration that uses Google Cloud service account files.
A potential config would look the following (including my future Stackdriver integration):
tts:
- platform: google_cloud
key_file: googlecloud.json
google_pubsub:
project_id: smart-home14
topic_name: main
credentials_json: googlecloud.json
google_stackdriver:
key_file: googlecloud.json
log_level: info
project_id: smart-home14
labels:
source: home-assistant
Proposal
Create a Google Cloud integration that as config has a service account file which I could also leverage for the Stackdriver integration. The above example would then look the following:
google_cloud:
key_file: googlecloud.json
tts:
- platform: google_cloud
google_pubsub:
project_id: smart-home14
topic_name: main
google_stackdriver:
log_level: info
labels:
source: home-assistant
Consequences
This would introduce two breaking changes but will allow for more streamlined configuration going forward.
This does not allow for separate service accounts anymore. Separate service accounts only make sense if they either add security or allow better accounting. Once Home Assistant is breached it would be trivial to also extract the second service account and accounting can be done on the API level since Google Cloud bills this separately.
Any help with getting the platform and integration part right is welcome. I for example thought if stackdriver config should be part of logging in case there will be additional logging platforms in the future.
In case it is not clear from my Github profile I am actually working for Google Cloud but my job is not directly related to Home Assistant and I am mostly doing this on my own time.
Would there be any other benefits besides only having to specify the file once in your config ?
I think that is unnecessary. As long as they share helper code related to handling the service account I think that would be okey.
google_assistant also use service accounts.
No that would pretty much be the only benefit.
@elupus the code to handle the service account is provided by the core component in https://github.com/googleapis/google-cloud-python whats left is the different name for the key file that is named credentials_json
in pubsub and key_file for the TTS API.
Yeah we should align the config keys
Can we use yaml anchors & extend there? That would minimize the DRY violation.
If we are concerned about the repetition then we should fix it by getting rid of the repetition and not by using yaml anchors. Imagine that in a world where most of our config is doable from the frontend. Then you would set up Google Cloud which would ask you for the credential file that you would have to either paste or upload. Then you can essentially with one click enable Stackdriver and if you want to use PubSub you have to specify two more parameters.
However I agree with @balloob that this does not have to happen right now and aligning the config files by changing the PubSub integration to also use key_file
instead of credentials_json
would solve that. I am proposing to change PubSub because it seems that this gets used less frequently than TTS and I personally don't think the config should have the word json in it since that's an implementation detail.
If we have data on which component gets used more it might help with deciding this.
Note: json is valid yaml. So preferable it's better if they just take a dict value and you can use
key_data: !include key_file.json
You can then get it from secrets store.
key_data: !secret MYKEY
Reference on above way to load the json file: https://github.com/home-assistant/home-assistant/blob/992d9273bb68690fadfb2232d47df15f6314956d/homeassistant/components/google_assistant/init.py#L83
If we are concerned about the repetition then we should fix it by getting rid of the repetition and not by using yaml anchors. Imagine that in a world where most of our config is doable from the frontend. Then you would set up Google Cloud which would ask you for the credential file that you would have to either paste or upload. Then you can essentially with one click enable Stackdriver and if you want to use PubSub you have to specify two more parameters.
YAML anchors are not a solution but I personally prefer them over the magic loaders in the yaml.
In the end credentials are a specialized type of entity that you would like to refer to and store in a specialized store (not in yaml, ideally in hardware backed storage).
That would also translate to the .storage
-storage.
@elupus the library expects a key_file and not the credentials themself, so it does not make a ton of sense to load that into the config. https://github.com/googleapis/google-cloud-python/blob/master/api_core/docs/auth.rst#service-accounts
Credentials.from_service_account_info() exists. Which is what that function use.
I know it exists but I see no value in including this into the configuration. It just opens room for errors and if the key_file has a new field in the future we have to adjust validation on Home Assistant side. If the only reason to do that is to use the secrets store instead of a key file that is standard across Google Cloud (I can double check with folks working on this part of Google Cloud tomorrow) we should use it to avoid having to manage the format of this file.
I'd say it reduces risk since we can do volouptous validation on the data and give reasonable errors on config validation rather than handling an exception from Google's lib.
It also allow storage of this data using GUI in the normal config flows without need to store as a json file in the background.
I double checked and even at Google internally we recommend using the file instead of manually extracting the key. Those files can in theory contain more information than they now do and the Google provided library if kept up to date is guaranteed to be able to read the file.
Unless you provide me with a library that will be supported going forward I will opt for the official way and cover the case that the file could not be parsed (this would only happen if someone manually changes the file or our version of the library is horribly out of date).
If this gets implemented as config flow I would recommend to also directly use the file (uploading a file that you get from the Cloud Console is less error prone than having to extract the key yourself.
I double checked and even at Google internally we recommend using the file instead of manually extracting the key. Those files can in theory contain more information than they now do and the Google provided library if kept up to date is guaranteed to be able to read the file.
Very quick reply (without checking docs): Does the library provide the ability to validate a key file? That could easily be used in a custom validator.
No the library tries to create credentials and then throws a value error if it does not succeed. The actual loading of the credentials is done by a private function that throws a value error.
I still think we should not try to validate the file and instead at run time throw an error if the file seems broken. If there is a use case where having it in yaml is enabling us to do more than just using the original file I am happy to look into it. However I feel that all this provides is more ways in how users can break their configuration. I know power-users want their configuration to use secrets but the entire content of this file should not be public so if everything that happens is that one has to convert the json file to yaml and then pack it into the secret file to then include it again we are over-complicating things.
Just for reference. No conversion is needed for secrets file. json is perfectly valid yaml. So it's just copy paste.
Anyway it's fully possible to support both cases at some point in the future in a backwords compatible way. So if it's a point of disagreement, let's ignore it for now and get back to the centralization of the information and use the implementors preferred choice.
I have created an initial version https://github.com/home-assistant/home-assistant/pull/29590 next step is to allow changing of the log level and propper error handling when we can not load the file.
This architecture issue is old, stale, and possibly obsolete. Things changed a lot over the years. Additionally, we have been moving to discussions for these architectural discussions.
For that reason, I'm going to close this issue.
../Frenck