azure-sdk-for-python icon indicating copy to clipboard operation
azure-sdk-for-python copied to clipboard

MLClient Job.get function returns JobParsingError

Open Uzzije opened this issue 1 year ago • 6 comments

  • azure-ai-ml:
  • 1.12.1:
  • Mac OSX:
  • 3.10:

Describe the bug When running the function to get the job status MLCllient.job.get(jobname) of a batch job in Azure ML, it raises an error. Error class is Error Class

To Reproduce Steps to reproduce the behavior:

  1. Deploy an Azure ML Job in Azure Machine learning workspace.
  2. Get the batch job name
  3. Attempt to invoke the call to get the job status - Execution to Get Job Status Documentation
  4. Error response is - {'type': 'JobParsingError', 'message': "Expecting (<class 'str'>, <class 'azure.ai.ml.entities._assets.environment.Environment'>) for environment, got <class 'dict'> instead."}

Expected behavior The function should return return the Job class that allows you to access properties of the job

Uzzije avatar Jan 12 '24 23:01 Uzzije

Hi @Uzzije. Thanks for reaching out. Can you please let us know how you are submitting the job? Please, share as much code as possible.

santiagxf avatar Jan 16 '24 15:01 santiagxf

Hi team, Even I am facing the similar error. It was working until now and out of nowhere we started facing this issue. model_submit.txt

amalprem avatar Jan 16 '24 15:01 amalprem

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.

github-actions[bot] avatar Jan 16 '24 23:01 github-actions[bot]

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.

github-actions[bot] avatar Jan 16 '24 23:01 github-actions[bot]

Hi @Uzzije. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

github-actions[bot] avatar Jan 16 '24 23:01 github-actions[bot]

Thanks for sharing the details. We are working on identifying the root of the issue. Hopefully and can come back soon with directions.

santiagxf avatar Jan 16 '24 23:01 santiagxf

Hi, i'm basically running the job the same @amalprem stated. From investigating the issue, it seems the failure occurs here azure.ai.ml.entities._builders.command.Command._attr_type_map. This snippet "environment": (str, Environment), is getting a type dict hence this traceback error -

Exception: Expecting (<class 'str'>, <class 'azure.ai.ml.entities._assets.environment.Environment'>) for environment, got <class 'dict'> instead..
Traceback (most recent call last):
  File "/opt/conda/envs/myenv/lib/python3.10/site-packages/azure/ai/ml/entities/_job/job.py", line 310, in _from_rest_object
    return PipelineJob._load_from_rest(obj)
  File "/opt/conda/envs/myenv/lib/python3.10/site-packages/azure/ai/ml/entities/_job/pipeline/pipeline_job.py", line 566, in _load_from_rest
    sub_nodes = PipelineComponent._resolve_sub_nodes(properties.jobs) if properties.jobs else {}
  File "/opt/conda/envs/myenv/lib/python3.10/site-packages/azure/ai/ml/entities/_component/pipeline_component.py", line 380, in _resolve_sub_nodes
    sub_nodes[node_name] = pipeline_node_factory.load_from_rest_object(obj=node)
  File "/opt/conda/envs/myenv/lib/python3.10/site-packages/azure/ai/ml/entities/_job/pipeline/_load_component.py", line 268, in load_from_rest_object

Uzzije avatar Jan 17 '24 12:01 Uzzije

I'm getting a similar issue. It was working fine just a week ago.

I'm running a batch job: ml_client.batch_endpoints.invoke(), returns a "BatchJob" object. Taking that object's "name" property and going to ml_client.jobs.get() no longer works and blowing up with:

JobParsingError: Expecting (<class 'str'>, <class 'azure.ai.ml.entities._assets.environment.Environment'>) for environment, got <class 'dict'> instead.

Just to add more information, doing a MLClient.jobs.list(parent_job_name="batchjob-run-id-here") also throws the same error.

bluebobbo avatar Jan 17 '24 18:01 bluebobbo

We have identified an issue in our service an a patch has been applied. We are rolling out the update to all the regions as we speak and you can expect the roll out to be completed by tomorrow, except for the region East US 2 which will be patched the day after. Apologies for the inconvenience. I will update this thread once the deployment is completed.

santiagxf avatar Jan 18 '24 17:01 santiagxf

We have identified an issue in our service an a patch has been applied. We are rolling out the update to all the regions as we speak and you can expect the roll out to be completed by tomorrow, except for the region East US 2 which will be patched the day after. Apologies for the inconvenience. I will update this thread once the deployment is completed.

The fix has been deployed to all public regions. Customers should no longer experience this problem now.

PengSY avatar Jan 19 '24 08:01 PengSY

We have validated that the patch has fixed the issue. We will close this issue, but please reopen it if you find any further problem using the service.

santiagxf avatar Jan 19 '24 13:01 santiagxf