versatile-data-kit
versatile-data-kit copied to clipboard
vdk-core: support overriding configs with secrets
Why?
VDK doesn't provide a way to set sensitive configuration like passwords, such as trino_password. The only way to currently do this is by adding config keys and fetching the values from secrets.
What?
Add a plugin that reconfigures the Configuration object in CoreContext based on secrets. Do this in the initialize_job hook. In this setup, secrets override options set by regular configs. For example if you set trino_password to "password" in config.ini, but also have a secret called trino_passowrd="another password", the value of trino_password will be "another_passowrd".
Note: The Configuration class and the CoreContext class are annotated with @dataclass(frozen=True). This enforces encapsulation, so in order to mutate the Configuration object after it's created, we have to add more public methods to the Configuration class.
https://docs.python.org/3/library/dataclasses.html#frozen-instances
How was this tested?
Functional test CI/CD
What kind of change is this?
Feature/non-breaking
I was wondering if we need to finalize the configuration at vdk_configure. Now below proposal is bigger and it may require a bit broader discussion (and is more costly so might not be worth at this time). And re-reading it I am not sure it's that beneficial.
So I am ok if we ship the current implementation as it is.
But this is not the first time we've had to create workarounds due to the limitation that configuration options can only be set during that phase. In the JobConfigIniPlugin, we check if the command is run and search for the config.ini file.
Perhaps we could allow for more dynamic configuration?
As it is currently, vdk_configure is used to define the available configuration options, but the actual values can be provided or overridden later by dynamically added providers.
@hookimpl
def vdk_configure(context):
context.add("team", description, default, ...)
Configuration providers can be dynamically added at any point in the application's lifecycle.
@hookimpl
def run_job(context):
context.configuration.add_configuration_provider(
name="secrets",
get_function=lambda context, key: context.secrets.get_secret(key),
priority_after="config_ini" # The default priority is LIFO.
)
Follow-up https://github.com/vmware/versatile-data-kit/issues/3156
Follow-ups
https://github.com/vmware/versatile-data-kit/issues/3156 https://github.com/vmware/versatile-data-kit/issues/3210