tfx
tfx copied to clipboard
tfx trainer job failed on the GCP AI plateform
Run the tfx taxi pipeline, in the trainer component when I submit job to the AI platform, this job failed with the error
"ValueError: Expect custom_config to be a dict but got <class 'str'> instead".
Read the notes, custom_config should be assumed to be a JSON-serialized string.
Here is an example of custom config
@drzl386, Can you please respond to @1025KB's comment. Thanks!
Hey there, I just follow that and give me the new error
AttributeError: 'str' object has no attribute 'get'
I use the config as following: customer_config = {ai_platform_trainer_executor.TRAINING_ARGS_KEY: { 'project': '...', 'region': '....', 'scaleTier': 'BASIC_GPU'}, 'embed_size':_embed_size, 'encoded_hidden_units': _encoded_hidden_units, 'decoded_hidden_units': _decoded_hidden_units}, here _embed_size, _encoded_hidden_units, _decoded_hidden_units are running time paramters.
I've got the same error, below is my trainer code, is the same as the demo templates.
trainer_args = {
'run_fn': run_fn,
'transformed_examples': transform.outputs['transformed_examples'],
'schema': schema_gen.outputs['schema'],
'transform_graph': transform.outputs['transform_graph'],
'train_args': train_args,
'eval_args': eval_args,
'custom_executor_spec':
executor_spec.ExecutorClassSpec(trainer_executor.GenericExecutor),
}
if ai_platform_training_args is not None:
trainer_args.update({
'custom_executor_spec':
executor_spec.ExecutorClassSpec(
ai_platform_trainer_executor.GenericExecutor
),
'custom_config': {
ai_platform_trainer_executor.TRAINING_ARGS_KEY:
ai_platform_training_args
}
})
trainer = Trainer(**trainer_args)
# TODO(step 6): Uncomment here to add Trainer to the pipeline.
components.append(trainer)
@drzl386 @C45513 Is this issue still valid. Can you please let us know if its still an issue. Thanks!
Closing this issue as it has been stale for more than 2 weeks. Please reopen it if you respond to the issue. Thanks!