hop
hop copied to clipboard
[Feature Request]: environment variable interpolation with HOP_RUN_PARAMETERS and a shell script workaround
What would you like to happen?
When orchestrating Apache Hop in a containerized environment (for example, running Hop Docker container in Kubernetes), it is common practice to assign sensitive (eg. credentials, username, password, etc) and environment specific parameters (database url, schema name, etc) through environment variables. This is commonplace with containerized solutions that are set up in Kubernetes.
Best practice is to use environment variables, since they can be set separately and easily managed through orchestration levels.
Currently Apache Hop supports the use of environment variables (accessible in shell/bash), but they must be redefined with values in the HOP_RUN_PARAMETERS
environment variable itself. Then the variables referenced in project configuration can access the values.
This makes orchestration cumbersome, because all environment variables must be defined in HOP_RUN_PARAMETERS
as well.
One workaround that some software solutions use, is to have a predefined prefix for environment variables and all ENVs set that match that prefix are automatically forwarded to the application. Kubernetes supports this with configMap and secret both having the "prefix" option.
For our solution we had to implement a way for the Hop container to find all requested environment variables defined in the configuration files and then on container start, run a shell script that will define HOP_RUN_PARAMETERS
on the fly with variable names found in configuration files and values that are found in the shell environment.
Other approach for us would have been to pass all environment variables to HOP_RUN_PARAMETERS
that were defined in shell, but it is not ideal, because then Hop would have access to other ENV's that should perhaps not be accessible by Hop.
Talked about this in Hop Mattermost chat channel.
Apache Hop run-with-parameters.sh
Entrypoint to this image is run-with-parameters.sh
.
When launced, the following files will be processed:
-
$HOP_PROJECT_FOLDER/$HOP_PROJECT_CONFIG_FILE_NAME
- by default /files/project-config.json -
$HOP_ENVIRONMENT_CONFIG_FILE_NAME_PATHS
- comma separated list of environment config files
All environment variables (either $VARIABLE
or ${VARIABLE}
) found in those files will be assigned to HOP_RUN_PARAMETERS
environment variable as a list of keys and values.
So for example, if /files/project-config.json
contains environment variables:
-
$MY_ENVIRONMENT_VARIABLE
which corresponds to the valueMY_VALUE
-
${OTHER_ENV}
which corresponds to the valueOTHER VALUE
Then the environment variable HOP_RUN_PARAMETERS
will be assigned the value of: MY_ENVIRONMENT_VARIABLE="MY_VALUE",OTHER_ENV="OTHER VALUE"
.
A preferred approach
A more ideal approach would be not to have a shell script in between, but to give Apache Hop direct access to environment variables through a configuration option (perhaps a env itself) and it could be solved with a prefix.
For example, when:
-
HOP_ENV_PREFIX
is set toETL_
then all environment variables beginning withETL_
are directly accessible by the Hop job. -
HOP_ENV_PREFIX
is set to `` (an empty string, so ENV is defined, but empty) then all environment variables in the shell environment are directly accessible by the Hop job. - and if
HOP_ENV_PREFIX
is not set at all, then it would work as it is currently working - not being able to access other variables than specifically defined inHOP_RUN_PARAMETERS
.
Issue Priority
Priority: 3
Issue Component
Component: Hop Run