hop icon indicating copy to clipboard operation
hop copied to clipboard

[Feature Request]: environment variable interpolation with HOP_RUN_PARAMETERS and a shell script workaround

Open kriko opened this issue 1 year ago • 0 comments

What would you like to happen?

When orchestrating Apache Hop in a containerized environment (for example, running Hop Docker container in Kubernetes), it is common practice to assign sensitive (eg. credentials, username, password, etc) and environment specific parameters (database url, schema name, etc) through environment variables. This is commonplace with containerized solutions that are set up in Kubernetes.

Best practice is to use environment variables, since they can be set separately and easily managed through orchestration levels.

Currently Apache Hop supports the use of environment variables (accessible in shell/bash), but they must be redefined with values in the HOP_RUN_PARAMETERS environment variable itself. Then the variables referenced in project configuration can access the values.

This makes orchestration cumbersome, because all environment variables must be defined in HOP_RUN_PARAMETERS as well.

One workaround that some software solutions use, is to have a predefined prefix for environment variables and all ENVs set that match that prefix are automatically forwarded to the application. Kubernetes supports this with configMap and secret both having the "prefix" option.

For our solution we had to implement a way for the Hop container to find all requested environment variables defined in the configuration files and then on container start, run a shell script that will define HOP_RUN_PARAMETERS on the fly with variable names found in configuration files and values that are found in the shell environment.

Other approach for us would have been to pass all environment variables to HOP_RUN_PARAMETERS that were defined in shell, but it is not ideal, because then Hop would have access to other ENV's that should perhaps not be accessible by Hop.

Talked about this in Hop Mattermost chat channel.

A solution or a workaround to this problem can be found here: https://gist.github.com/kriko/7267b91ff18eebdcd1a456921f0f2fd9

Apache Hop run-with-parameters.sh

Entrypoint to this image is run-with-parameters.sh.

When launced, the following files will be processed:

  1. $HOP_PROJECT_FOLDER/$HOP_PROJECT_CONFIG_FILE_NAME - by default /files/project-config.json
  2. $HOP_ENVIRONMENT_CONFIG_FILE_NAME_PATHS - comma separated list of environment config files

All environment variables (either $VARIABLE or ${VARIABLE}) found in those files will be assigned to HOP_RUN_PARAMETERS environment variable as a list of keys and values.

So for example, if /files/project-config.json contains environment variables:

  • $MY_ENVIRONMENT_VARIABLE which corresponds to the value MY_VALUE
  • ${OTHER_ENV} which corresponds to the value OTHER VALUE

Then the environment variable HOP_RUN_PARAMETERS will be assigned the value of: MY_ENVIRONMENT_VARIABLE="MY_VALUE",OTHER_ENV="OTHER VALUE".

A preferred approach

A more ideal approach would be not to have a shell script in between, but to give Apache Hop direct access to environment variables through a configuration option (perhaps a env itself) and it could be solved with a prefix.

For example, when:

  • HOP_ENV_PREFIX is set to ETL_ then all environment variables beginning with ETL_ are directly accessible by the Hop job.
  • HOP_ENV_PREFIX is set to `` (an empty string, so ENV is defined, but empty) then all environment variables in the shell environment are directly accessible by the Hop job.
  • and if HOP_ENV_PREFIX is not set at all, then it would work as it is currently working - not being able to access other variables than specifically defined in HOP_RUN_PARAMETERS.

Issue Priority

Priority: 3

Issue Component

Component: Hop Run

kriko avatar Feb 16 '24 13:02 kriko