tfx icon indicating copy to clipboard operation
tfx copied to clipboard

tfx trainer job failed on the GCP AI plateform

Open drzl386 opened this issue 5 years ago • 5 comments

Run the tfx taxi pipeline, in the trainer component when I submit job to the AI platform, this job failed with the error

"ValueError: Expect custom_config to be a dict but got <class 'str'> instead".

Read the notes, custom_config should be assumed to be a JSON-serialized string.

drzl386 avatar Jul 13 '20 15:07 drzl386

Here is an example of custom config

1025KB avatar Jul 14 '20 00:07 1025KB

@drzl386, Can you please respond to @1025KB's comment. Thanks!

rmothukuru avatar Jul 21 '20 05:07 rmothukuru

Hey there, I just follow that and give me the new error

AttributeError: 'str' object has no attribute 'get'

I use the config as following: customer_config = {ai_platform_trainer_executor.TRAINING_ARGS_KEY: { 'project': '...', 'region': '....', 'scaleTier': 'BASIC_GPU'}, 'embed_size':_embed_size, 'encoded_hidden_units': _encoded_hidden_units, 'decoded_hidden_units': _decoded_hidden_units}, here _embed_size, _encoded_hidden_units, _decoded_hidden_units are running time paramters.

drzl386 avatar Jul 21 '20 15:07 drzl386

I've got the same error, below is my trainer code, is the same as the demo templates.

trainer_args = {
    'run_fn': run_fn,
    'transformed_examples': transform.outputs['transformed_examples'],
    'schema': schema_gen.outputs['schema'],
    'transform_graph': transform.outputs['transform_graph'],
    'train_args': train_args,
    'eval_args': eval_args,
    'custom_executor_spec':
        executor_spec.ExecutorClassSpec(trainer_executor.GenericExecutor),
}
if ai_platform_training_args is not None:
  trainer_args.update({
      'custom_executor_spec':
          executor_spec.ExecutorClassSpec(
              ai_platform_trainer_executor.GenericExecutor
          ),
      'custom_config': {
          ai_platform_trainer_executor.TRAINING_ARGS_KEY:
              ai_platform_training_args
      }
  })
trainer = Trainer(**trainer_args)
# TODO(step 6): Uncomment here to add Trainer to the pipeline.
components.append(trainer)

C45513 avatar Jul 23 '20 06:07 C45513

@drzl386 @C45513 Is this issue still valid. Can you please let us know if its still an issue. Thanks!

gowthamkpr avatar Aug 10 '22 05:08 gowthamkpr

Closing this issue as it has been stale for more than 2 weeks. Please reopen it if you respond to the issue. Thanks!

gowthamkpr avatar Aug 31 '22 17:08 gowthamkpr

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Aug 31 '22 17:08 google-ml-butler[bot]