OpenMetadata
OpenMetadata copied to clipboard
Add a Secret Management Store interface
Context
Sensitive data from services is currently stored in MySQL (or Postgres) encrypted. This is the current process between the OM server and Airflow workflows:
The IngestionPipeline
contains the service information as an EntityReference
. In Airflow, we use that reference to read the Service information, including its connection details. This information goes from Database > OM > Airflow
.
We aim to improve this process and not necessarily store any sensitive information in any system from OpenMetadata. Introducing an interface to communicate with any Key Management Store, users can then choose to use the underlying database as KMS, or any external system such as AWS KMS, Azure Key Vault, or Hashicorp Vault, for example.
This way, OM won't store not share any sensitive information. Instead, the KMS will act as a mediator:
Tasks
Backend
Requirements to be covered:
- Prepare an interface in the backend so that users can configure which Secret Manager to use: Local (the current approach for encrypting connections sensitive data with fernet) or AWS Secrets Manager. Then, the Service Entity won't directly store sensitive information.
- All the service connection JSON payloads of the service will be stored in the secret manager.
- Connections will never be shown to users.
- If the application has a secret manager configured:
- only the admin user will be able to retrieve/edit service connections.
- the connection test will stop sending the connection parameters to airflow.
- Save credentials when bootstrapping.
- Migration mechanism to save all the current connection data of our services into a secret manager when bootstrapping.
Task breakdown:
- https://github.com/open-metadata/OpenMetadata/issues/5915
- https://github.com/open-metadata/OpenMetadata/issues/5916
- https://github.com/open-metadata/OpenMetadata/issues/5917
- https://github.com/open-metadata/OpenMetadata/issues/5918
- https://github.com/open-metadata/OpenMetadata/issues/5919
- https://github.com/open-metadata/OpenMetadata/issues/6487
- https://github.com/open-metadata/OpenMetadata/issues/6511
- https://github.com/open-metadata/OpenMetadata/issues/6517
- https://github.com/open-metadata/OpenMetadata/issues/6643
Ingestion
We should:
- Prepare the same interface on the ingestion side, so we can safely retrieve this information from Airflow.
- Get connections from the secret manager. We can pass the secret store configuration to airflow via
airlfow.cfg
. - Change the way we use the security configuration if the credentials are saved in the secrets manager.
- e.g put the ingestion bot JWT token into the secret store. When starting airflow, pick it up from there as well. We should not share the
OpenMetadataJWTClientConfig
to the Airflow via API. We should just take it from there. The server bootstraps and creates this token during start time and stores it in the secret manager. Then when airflow starts, know where to pick this up based on theairflow.cfg
configuration.
- e.g put the ingestion bot JWT token into the secret store. When starting airflow, pick it up from there as well. We should not share the
- Add back the hostPort to the charts, dashboards, and tasks URLs since the connection parameters will be hidden for users if the secret manager is configured.
Task breakdown:
- https://github.com/open-metadata/OpenMetadata/issues/6212
- https://github.com/open-metadata/OpenMetadata/issues/5920
- https://github.com/open-metadata/OpenMetadata/issues/5921
- https://github.com/open-metadata/OpenMetadata/issues/5924
- https://github.com/open-metadata/OpenMetadata/issues/6511
- https://github.com/open-metadata/OpenMetadata/issues/6643
Documentation
- https://github.com/open-metadata/OpenMetadata/issues/6512
All the service connections are using Fernet as default algorithm to encrypt/decrypt passwords in the following way:
The solution will be to move into something that allow us implement different Secret Store Managers:
We can keep using Fernet which will be wrapped in the LocalSecretStoreManager
implementation.
Then, we can configure which Secret Store Manager we want to use in our application configuration YAML file by doing for example:
secretStoreManagerConfiguration:
# secretStoreManager: ${SECRET_MANAGER_CLASS_NAME:-LocalSecretStoreManager}
secretStoreManager: AWSSecretStoreManager
configuration:
region: eu-west-1
accessKeyId: 1234asdf
secretAccessKey: qwer5678
To discuss:
Two types of approaches in terms of External Secret Management services:
- SaaS-focused: we use AWS Secret Management to write the secret and read it from the API and Airflow. This helps improve security as no sensitive info will ever reach the db, even if we have encryption
- Client integration: users already have a Secret Management System which they can configure in the OM config and we then just read the key name (not the actual password), when services are created. This is a READ-ONLY approach for users to reuse their existing infra.