tfx icon indicating copy to clipboard operation
tfx copied to clipboard

WIP: Enable env vars in beam_args through placeholders

Open ConverJens opened this issue 2 years ago • 19 comments

ConverJens avatar May 12 '22 14:05 ConverJens

@jiyongjung0 The build is failing on building some dependency. Could you have a look? This does not happen for me when I build the TFX image and whls locally.

ConverJens avatar May 12 '22 14:05 ConverJens

@ConverJens Thank you again for spotting this. I forgot to update the ci-build for the pypi issue, too. Fixed in https://github.com/tensorflow/tfx/commit/d5be11f2ea6745fb87b0d86313892d176207e506.

Could you rebase and retry?

jiyongjung0 avatar May 13 '22 01:05 jiyongjung0

@jiyongjung0 I rebased agains master but it seems like the issue remained.

ConverJens avatar May 13 '22 07:05 ConverJens

@ConverJens Thank you for the test. It seems like an issue with Cython. I'm trying to make a workaround for this. Please refer https://github.com/tensorflow/tfx/pull/4870 until it is merged.

jiyongjung0 avatar May 13 '22 09:05 jiyongjung0

FYI, https://github.com/tensorflow/tfx/commit/517441dbed44467ab20d66da1dd9a3ca3896ba62 submitted with a workaround for the Cython issue.

jiyongjung0 avatar May 20 '22 08:05 jiyongjung0

@ConverJens Please let @kennethyang404 and me know when the PR is ready to review.

Thanks!

jiyongjung0 avatar Jun 07 '22 07:06 jiyongjung0

Hi @jiyongjung0,

I've occupied by other stuff will try to make some progress on this PR this week if I can manage. Will let you know.

ConverJens avatar Jun 13 '22 13:06 ConverJens

@jiyongjung0 The build fails on pre-existing proto lint failures now.

ConverJens avatar Jun 13 '22 13:06 ConverJens

@ConverJens Thank you for the report. You can ignore it for now although you might not be able to run lint checkers. Let me take a look whether there is a way to suppress the linter.

jiyongjung0 avatar Jun 13 '22 23:06 jiyongjung0

@jiyongjung0 @kennethyang404 I have created a unit test for the beam placeholder logic but it currently fails with the very unhelpful message: "not a cmessage"

Any help understanding why this is happening would be appreciated. My only guess is that I haven't compiled the protos locally and thus, that the updated executor_spec.proto is not included. Could that be the case?

ConverJens avatar Jun 14 '22 13:06 ConverJens

It seems like an inconsistency of types. Because the failed line adds arguments to a proto, but the given item is a list containing EnvironmentVariablePlaceholder python object.

By the way, I am afraid that changing the type of a proto message field might be dangerous in most cases.

jiyongjung0 avatar Jun 15 '22 03:06 jiyongjung0

@jiyongjung0 It was, I used the wrong placeholder type.

As of now, if I only use placeholders (no beam args as strings), the compilation succeeds but the component fails upon execution. @kennethyang404 When we discussed this during our design session, was there any other additional steps that you believe is needed to accommodate both strings and placeholders? It seems to me as either the placeholders needs to be serialized to strings and stored as strings in the proto or the plain strings might need to be wrappen in a placeholder or similar. WDYT?

ConverJens avatar Jun 20 '22 09:06 ConverJens

Hi,

I couldn't add inline comment in the files, so I will just write down a couple things I noticed here:

  1. I think EnvironmentVariablePlaceholder should inherits from Placeholder instead of _ProtoAccessiblePlaceholder, because env variable is just plain value, not proto.

  2. A placeholder needs to be serialized to be stored in a Placeholder proto. I think at [1] you probably need something like

for arg in self.beam_pipeline_args:
  if isinstance(arg, str):
    # build a PlaceholderExpression proto that only stores a string. [2]
  elif isinstance(arg, ph.Placeholder):
    # call arg.encode() to convert a Placeholder object into a PlaceholderExpression proto.
  else:
    raise ValueError('Unsupported arg type.')
  1. In the executor, a PlaceholderExpression proto needs to be evaluated to the actual value. I think at [3] (but not 100% sure), you probably need to call resolve_placeholder_expression [4] to evaluate that proto into a primitive value.

[1] https://github.com/tensorflow/tfx/blob/8e05d0758fc68c68ea89578b5170e1ef150c9ab5/tfx/dsl/components/base/executor_spec.py#L159

[2] https://github.com/tensorflow/tfx/blob/b88e79c98718f620f5ebeef087cda621c33c6495/tfx/proto/orchestration/placeholder.proto#L28

[3] https://github.com/tensorflow/tfx/blob/b88e79c98718f620f5ebeef087cda621c33c6495/tfx/orchestration/portable/beam_executor_operator.py#L70

[4] https://github.com/tensorflow/tfx/blob/b88e79c98718f620f5ebeef087cda621c33c6495/tfx/dsl/compiler/placeholder_utils.py#L71

kennethyang404 avatar Jun 29 '22 15:06 kennethyang404

@kennethyang404 Thanks for the input!

I have made some progress and beam args can now be both placeholder and string.

Right now the issue is that the resolving fails which I believe is erroneous resolving logic in placeholder_utils. Basically the context is a function and the resolving tries: context[placeholder.key] which of course fails since the context is not subscriptable.

ConverJens avatar Jul 01 '22 16:07 ConverJens

@kennethyang404 @jiyongjung0 I think I got it working now. Just some testing left.

ConverJens avatar Jul 02 '22 08:07 ConverJens

@jiyongjung0 @kennethyang404 Now everything works as expected. String, placeholder and string + placeholder inputs works as expected which means one can now do:

pipeline_py.Pipeline(
        pipeline_name='test_pipeline',
        pipeline_root=pipeline_root_path,
        metadata_connection_config=sqlite_metadata_connection_config(
            metadata_path),
        components=[dummy_beam_component],
        beam_pipeline_args=['--runner=DirectRunner',
                            '--direct_running_mode=' + ph.Placeholder(
                                placeholder_type=placeholder_pb2.Placeholder.ENVIRONMENT_VARIABLE,
                                key=direct_running_mode_env_var_name),
                            ph.Placeholder(placeholder_type=placeholder_pb2.Placeholder.ENVIRONMENT_VARIABLE,
                                           key=num_workers_env_var_name)
                            ],
    )

The combination of string and placeholder can be used to supply passwords mounted from secrets like so:

s3_secret_access_key_env_var_name = 'S3_SECRET_ACCESS_KEY'
'--s3_secret_access_key=' + ph.Placeholder(
                                placeholder_type=placeholder_pb2.Placeholder.ENVIRONMENT_VARIABLE,
                                key=s3_secret_access_key_env_var_name)

where one has mounted the secret key to the environment variable S3_SECRET_ACCESS_KEY

ConverJens avatar Jul 03 '22 08:07 ConverJens

@ConverJens Can you please check @kennethyang404's comments and keep us posted ? Thank you!

gbaned avatar Jul 29 '22 14:07 gbaned

@gbaned I'm have had no time for the last week's and won't until mid to end of August. Will check the comments and make adjustments then. This hasn't been forgotten 🙂

Cc: @kennethyang404

ConverJens avatar Jul 31 '22 16:07 ConverJens

It looks like there is also an issue with the CLA, although it looks like it might be Jens using two different accounts?

rcrowe-google avatar Aug 10 '22 18:08 rcrowe-google

@rcrowe-google Yep, I have messed up my gitconfig and commited using another email. Will try to sort it out.

ConverJens avatar Aug 22 '22 16:08 ConverJens

@kennethyang404 Fixed all comments according to your comments. Build fails on pre-existing protolint issues.

ConverJens avatar Aug 22 '22 16:08 ConverJens

@rcrowe-google CLA is fixed.

ConverJens avatar Aug 22 '22 17:08 ConverJens

Looks like we're getting close! A few proto errors caught by protolint.

rcrowe-google avatar Aug 22 '22 18:08 rcrowe-google

@rcrowe-google Indeed we are! I think the proto errors were preexisting and not related to my changes though. Correct me if I'm wrong.

ConverJens avatar Aug 22 '22 18:08 ConverJens

@kennethyang404 Could you give it another round of review? I believe that we are finished now :)

cc @jiyongjung0 @rcrowe-google

ConverJens avatar Aug 25 '22 07:08 ConverJens

@kennethyang404 No worries! Hope you had a great vacation and thanks for the comment. I have amended it and the build is now passing. Let me know if there is anything else!

cc @jiyongjung0 @rcrowe-google

ConverJens avatar Sep 06 '22 12:09 ConverJens

@kennethyang404 Awsome! Anything else needed from me?

ConverJens avatar Sep 06 '22 12:09 ConverJens

@jiyongjung0 You are also set as a reviewer on this PR but i don't know if you also have to approve?

ConverJens avatar Sep 06 '22 13:09 ConverJens

@ConverJens We only need one approval, but I'm happy to approve this again. :) Now tfx team will move this PR into our internal review system and start the submission process. Thanks!

jiyongjung0 avatar Sep 07 '22 01:09 jiyongjung0

@ConverJens We only need one approval, but I'm happy to approve this again. :) Now tfx team will move this PR into our internal review system and start the submission process. Thanks!

Awsome, thanks!

ConverJens avatar Sep 07 '22 07:09 ConverJens