MLOpsPython icon indicating copy to clipboard operation
MLOpsPython copied to clipboard

Model Registration Fails

Open jahanzaibanwar opened this issue 2 years ago • 3 comments

Web capture_10-3-2022_171817_ml azure com I ran the Ci pipeline and it worked super fine. Everything was working. Now its been two days that i am stuck at this error dataset is the same each and everything is the same but this error appeared from no where and i am having hard time to understand why this is happening? Any help would be appreciated Thank you

jahanzaibanwar avatar Mar 10 '22 16:03 jahanzaibanwar

In your train file, probably train_model.py, are you tagging the value dataset.id with the key that you are trying to use in register_model.py?

Should be something like this: run.parent.tag("dataset_id", value=dataset.id)

wissamjur avatar Mar 14 '22 15:03 wissamjur

@wissamjur yes i do have that code in my train_aml.py image

jahanzaibanwar avatar Mar 21 '22 09:03 jahanzaibanwar

@jahanzaibanwar Well, first I'd look at: parent_tags = run.parent.get_tags() try to print it, what do you get in the logs?

Make sure your train file succesffully completes with: run.complete() (after dumping the model of course)

If parent_tags is empty, you can also double check that in your build pipeline, you are passing the pipeline_data param:

register_step = PythonScriptStep(
    name="Register Model ",
    script_name=e.register_script_path,
    compute_target=aml_compute,
    source_directory=e.sources_directory_train,
    inputs=[pipeline_data],
    arguments=[
        "--model_name", model_name_param,
        "--step_input", pipeline_data,
    ],
    runconfig=run_config,
    allow_reuse=False,
)

Your pipeline should have the correct steps in order as well:

train_step.run_after(prep_step)
register_step.run_after(train_step)
steps = [prep_step, train_step, register_step]
train_pipeline = Pipeline(workspace=aml_workspace, steps=steps)

Last thing, I'm not sure if this would be an issue in your case, but since you mentioned that it did work once. Maybe try setting the allow_reuse param to False in your pipeline steps? If you are not using it in the correct way based on your design, you might face such issues

wissamjur avatar Mar 24 '22 22:03 wissamjur