tfx
tfx copied to clipboard
WIP: Enable env vars in beam_args through placeholders
@jiyongjung0 The build is failing on building some dependency. Could you have a look? This does not happen for me when I build the TFX image and whls locally.
@ConverJens Thank you again for spotting this. I forgot to update the ci-build for the pypi issue, too. Fixed in https://github.com/tensorflow/tfx/commit/d5be11f2ea6745fb87b0d86313892d176207e506.
Could you rebase and retry?
@jiyongjung0 I rebased agains master but it seems like the issue remained.
@ConverJens Thank you for the test. It seems like an issue with Cython. I'm trying to make a workaround for this. Please refer https://github.com/tensorflow/tfx/pull/4870 until it is merged.
FYI, https://github.com/tensorflow/tfx/commit/517441dbed44467ab20d66da1dd9a3ca3896ba62 submitted with a workaround for the Cython issue.
@ConverJens Please let @kennethyang404 and me know when the PR is ready to review.
Thanks!
Hi @jiyongjung0,
I've occupied by other stuff will try to make some progress on this PR this week if I can manage. Will let you know.
@jiyongjung0 The build fails on pre-existing proto lint failures now.
@ConverJens Thank you for the report. You can ignore it for now although you might not be able to run lint checkers. Let me take a look whether there is a way to suppress the linter.
@jiyongjung0 @kennethyang404 I have created a unit test for the beam placeholder logic but it currently fails with the very unhelpful message: "not a cmessage"
Any help understanding why this is happening would be appreciated. My only guess is that I haven't compiled the protos locally and thus, that the updated executor_spec.proto is not included. Could that be the case?
It seems like an inconsistency of types. Because the failed line adds arguments to a proto, but the given item is a list containing EnvironmentVariablePlaceholder
python object.
By the way, I am afraid that changing the type of a proto message field might be dangerous in most cases.
@jiyongjung0 It was, I used the wrong placeholder type.
As of now, if I only use placeholders (no beam args as strings), the compilation succeeds but the component fails upon execution. @kennethyang404 When we discussed this during our design session, was there any other additional steps that you believe is needed to accommodate both strings and placeholders? It seems to me as either the placeholders needs to be serialized to strings and stored as strings in the proto or the plain strings might need to be wrappen in a placeholder or similar. WDYT?
Hi,
I couldn't add inline comment in the files, so I will just write down a couple things I noticed here:
-
I think
EnvironmentVariablePlaceholder
should inherits fromPlaceholder
instead of_ProtoAccessiblePlaceholder
, because env variable is just plain value, not proto. -
A placeholder needs to be serialized to be stored in a
Placeholder
proto. I think at [1] you probably need something like
for arg in self.beam_pipeline_args:
if isinstance(arg, str):
# build a PlaceholderExpression proto that only stores a string. [2]
elif isinstance(arg, ph.Placeholder):
# call arg.encode() to convert a Placeholder object into a PlaceholderExpression proto.
else:
raise ValueError('Unsupported arg type.')
- In the executor, a
PlaceholderExpression
proto needs to be evaluated to the actual value. I think at [3] (but not 100% sure), you probably need to callresolve_placeholder_expression
[4] to evaluate that proto into a primitive value.
[1] https://github.com/tensorflow/tfx/blob/8e05d0758fc68c68ea89578b5170e1ef150c9ab5/tfx/dsl/components/base/executor_spec.py#L159
[2] https://github.com/tensorflow/tfx/blob/b88e79c98718f620f5ebeef087cda621c33c6495/tfx/proto/orchestration/placeholder.proto#L28
[3] https://github.com/tensorflow/tfx/blob/b88e79c98718f620f5ebeef087cda621c33c6495/tfx/orchestration/portable/beam_executor_operator.py#L70
[4] https://github.com/tensorflow/tfx/blob/b88e79c98718f620f5ebeef087cda621c33c6495/tfx/dsl/compiler/placeholder_utils.py#L71
@kennethyang404 Thanks for the input!
I have made some progress and beam args can now be both placeholder and string.
Right now the issue is that the resolving fails which I believe is erroneous resolving logic in placeholder_utils. Basically the context is a function and the resolving tries:
context[placeholder.key]
which of course fails since the context is not subscriptable.
@kennethyang404 @jiyongjung0 I think I got it working now. Just some testing left.
@jiyongjung0 @kennethyang404 Now everything works as expected. String, placeholder and string + placeholder inputs works as expected which means one can now do:
pipeline_py.Pipeline(
pipeline_name='test_pipeline',
pipeline_root=pipeline_root_path,
metadata_connection_config=sqlite_metadata_connection_config(
metadata_path),
components=[dummy_beam_component],
beam_pipeline_args=['--runner=DirectRunner',
'--direct_running_mode=' + ph.Placeholder(
placeholder_type=placeholder_pb2.Placeholder.ENVIRONMENT_VARIABLE,
key=direct_running_mode_env_var_name),
ph.Placeholder(placeholder_type=placeholder_pb2.Placeholder.ENVIRONMENT_VARIABLE,
key=num_workers_env_var_name)
],
)
The combination of string and placeholder can be used to supply passwords mounted from secrets like so:
s3_secret_access_key_env_var_name = 'S3_SECRET_ACCESS_KEY'
'--s3_secret_access_key=' + ph.Placeholder(
placeholder_type=placeholder_pb2.Placeholder.ENVIRONMENT_VARIABLE,
key=s3_secret_access_key_env_var_name)
where one has mounted the secret key to the environment variable S3_SECRET_ACCESS_KEY
@ConverJens Can you please check @kennethyang404's comments and keep us posted ? Thank you!
@gbaned I'm have had no time for the last week's and won't until mid to end of August. Will check the comments and make adjustments then. This hasn't been forgotten 🙂
Cc: @kennethyang404
It looks like there is also an issue with the CLA, although it looks like it might be Jens using two different accounts?
@rcrowe-google Yep, I have messed up my gitconfig and commited using another email. Will try to sort it out.
@kennethyang404 Fixed all comments according to your comments. Build fails on pre-existing protolint issues.
@rcrowe-google CLA is fixed.
Looks like we're getting close! A few proto errors caught by protolint.
@rcrowe-google Indeed we are! I think the proto errors were preexisting and not related to my changes though. Correct me if I'm wrong.
@kennethyang404 Could you give it another round of review? I believe that we are finished now :)
cc @jiyongjung0 @rcrowe-google
@kennethyang404 No worries! Hope you had a great vacation and thanks for the comment. I have amended it and the build is now passing. Let me know if there is anything else!
cc @jiyongjung0 @rcrowe-google
@kennethyang404 Awsome! Anything else needed from me?
@jiyongjung0 You are also set as a reviewer on this PR but i don't know if you also have to approve?
@ConverJens We only need one approval, but I'm happy to approve this again. :) Now tfx team will move this PR into our internal review system and start the submission process. Thanks!
@ConverJens We only need one approval, but I'm happy to approve this again. :) Now tfx team will move this PR into our internal review system and start the submission process. Thanks!
Awsome, thanks!