OpenMetadata icon indicating copy to clipboard operation
OpenMetadata copied to clipboard

Add a Secret Management Store interface

Open pmbrull opened this issue 2 years ago • 2 comments

Context

Sensitive data from services is currently stored in MySQL (or Postgres) encrypted. This is the current process between the OM server and Airflow workflows:

image

The IngestionPipeline contains the service information as an EntityReference. In Airflow, we use that reference to read the Service information, including its connection details. This information goes from Database > OM > Airflow.

We aim to improve this process and not necessarily store any sensitive information in any system from OpenMetadata. Introducing an interface to communicate with any Key Management Store, users can then choose to use the underlying database as KMS, or any external system such as AWS KMS, Azure Key Vault, or Hashicorp Vault, for example.

This way, OM won't store not share any sensitive information. Instead, the KMS will act as a mediator:

image

Tasks

Backend

Requirements to be covered:

  • Prepare an interface in the backend so that users can configure which Secret Manager to use: Local (the current approach for encrypting connections sensitive data with fernet) or AWS Secrets Manager. Then, the Service Entity won't directly store sensitive information.
  • All the service connection JSON payloads of the service will be stored in the secret manager.
  • Connections will never be shown to users.
  • If the application has a secret manager configured:
    • only the admin user will be able to retrieve/edit service connections.
    • the connection test will stop sending the connection parameters to airflow.
    • Save credentials when bootstrapping.
  • Migration mechanism to save all the current connection data of our services into a secret manager when bootstrapping.

Task breakdown:

  • https://github.com/open-metadata/OpenMetadata/issues/5915
  • https://github.com/open-metadata/OpenMetadata/issues/5916
  • https://github.com/open-metadata/OpenMetadata/issues/5917
  • https://github.com/open-metadata/OpenMetadata/issues/5918
  • https://github.com/open-metadata/OpenMetadata/issues/5919
  • https://github.com/open-metadata/OpenMetadata/issues/6487
  • https://github.com/open-metadata/OpenMetadata/issues/6511
  • https://github.com/open-metadata/OpenMetadata/issues/6517
  • https://github.com/open-metadata/OpenMetadata/issues/6643

Ingestion

We should:

  • Prepare the same interface on the ingestion side, so we can safely retrieve this information from Airflow.
  • Get connections from the secret manager. We can pass the secret store configuration to airflow via airlfow.cfg.
  • Change the way we use the security configuration if the credentials are saved in the secrets manager.
    • e.g put the ingestion bot JWT token into the secret store. When starting airflow, pick it up from there as well. We should not share the OpenMetadataJWTClientConfig to the Airflow via API. We should just take it from there. The server bootstraps and creates this token during start time and stores it in the secret manager. Then when airflow starts, know where to pick this up based on the airflow.cfg configuration.
  • Add back the hostPort to the charts, dashboards, and tasks URLs since the connection parameters will be hidden for users if the secret manager is configured.

Task breakdown:

  • https://github.com/open-metadata/OpenMetadata/issues/6212
  • https://github.com/open-metadata/OpenMetadata/issues/5920
  • https://github.com/open-metadata/OpenMetadata/issues/5921
  • https://github.com/open-metadata/OpenMetadata/issues/5924
  • https://github.com/open-metadata/OpenMetadata/issues/6511
  • https://github.com/open-metadata/OpenMetadata/issues/6643

Documentation

  • https://github.com/open-metadata/OpenMetadata/issues/6512

pmbrull avatar Jun 30 '22 14:06 pmbrull

All the service connections are using Fernet as default algorithm to encrypt/decrypt passwords in the following way:

image

The solution will be to move into something that allow us implement different Secret Store Managers:

image

We can keep using Fernet which will be wrapped in the LocalSecretStoreManager implementation.

Then, we can configure which Secret Store Manager we want to use in our application configuration YAML file by doing for example:

secretStoreManagerConfiguration:
#  secretStoreManager: ${SECRET_MANAGER_CLASS_NAME:-LocalSecretStoreManager}
  secretStoreManager: AWSSecretStoreManager
  configuration:
    region: eu-west-1
    accessKeyId: 1234asdf
    secretAccessKey: qwer5678

nahuelverdugo avatar Jul 04 '22 08:07 nahuelverdugo

To discuss:

Two types of approaches in terms of External Secret Management services:

  1. SaaS-focused: we use AWS Secret Management to write the secret and read it from the API and Airflow. This helps improve security as no sensitive info will ever reach the db, even if we have encryption
  2. Client integration: users already have a Secret Management System which they can configure in the OM config and we then just read the key name (not the actual password), when services are created. This is a READ-ONLY approach for users to reuse their existing infra.

pmbrull avatar Jul 04 '22 15:07 pmbrull