tfx icon indicating copy to clipboard operation
tfx copied to clipboard

How to set environment variables for pipeline stages

Open dvaldivia opened this issue 2 years ago • 1 comments

I'm trying to figure out how to set environment variables when creating a pipeline, the use case here is that I want to use S3 and I need to push down the credentials to tensorflow-io via environment variables, in previous releases of TFX when using reusable kubeflow components, I could simply add the environment variables, but if I try to use the built-in components and then compile them via DSL KubeflowDagRunnerConfig to a package, I don't see how I can set environment variables for individual stages.

Is the only option to wrap TFX native components in function components?

dvaldivia avatar Jul 18 '22 17:07 dvaldivia

Thank you for the question and it seems like a duplicate of https://github.com/tensorflow/tfx/issues/3326. I believe that @ConverJens is working on this in https://github.com/tensorflow/tfx/pull/4861.

jiyongjung0 avatar Jul 22 '22 02:07 jiyongjung0

@dvaldivia I don't think it's possible to set any k8s value (resources, env vars, secrets etc) on individual steps but you can set them on the pipeline level using the

kubeflow_dag_runner.KubeflowDagRunnerConfig(
    pipeline_operator_funcs=([your_func_to_set_env_vars_using_k8s_api(env_var_name, env_var_val)]
)

See my answer on this issue: https://github.com/tensorflow/tfx/issues/3194

cc @jiyongjung0

ConverJens avatar Aug 25 '22 09:08 ConverJens

@dvaldivia,

I see a PR #4861 merged which addresses this issue by enabling env. variables in beam_args through placeholders. This will be part of upcoming release. Please try the latest nightly build of TFX and let us know if this resolved your issue. Thank you!

singhniraj08 avatar Nov 04 '22 06:11 singhniraj08