yocto-gl icon indicating copy to clipboard operation
yocto-gl copied to clipboard

[BUG] log_model in azure ml failing but model still present in the model registry

Open gjurdzinski-deepsense opened this issue 1 year ago • 7 comments

Issues Policy acknowledgement

  • [X] I have read and agree to submit bug reports in accordance with the issues policy

Where did you encounter this bug?

Azure Machine Learning

Willingness to contribute

No. I cannot contribute a bug fix at this time.

MLflow version

  • Client: 2.10.0

System information

  • Python 3.11

Describe the problem

I have an azure ml job which trains an sklearn model. Following azureml-examples, I want the model to be saved in model registry (from inside of the job, like in the part Reading and writing model in a job) and returned as an output. I run:

mlflow.sklearn.log_model(
    sk_model=model,
    artifact_path=model_output,
    registered_model_name=model_name,
)

and the result is that the model is saved in the model registry but the job fails with an error:

UserErrorException:
	Message: Model asset creation API failed with {'additional_properties': {'details': [{'code': 'ModelAssetPathNotFoundInStorage', 'message': 'No blobs found in storage at model asset path: azureml/4319dfec-3b63-472d-a27c-656c56197170/model_output/'}], 'message': 'The request is invalid.', 'statusCode': 400, 'code': 'BadRequest'}, 'error': <data_capability._restclient.model.models._models_py3.RootError object at 0x14d8501c5490>, 'correlation': {'operation': '378f68a2a921e81a0b51ce367ed9d501', 'request': 'c0318bc9200d9941', 'RequestId': 'c0318bc9200d9941'}, 'environment': 'westeurope', 'location': 'westeurope', 'time': datetime.datetime(2024, 2, 13, 10, 39, 18, 806659, tzinfo=<FixedOffset '+00:00'>), 'component_name': 'modelregistry'}
	InnerException None
	ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "Model asset creation API failed with {'additional_properties': {'details': [{'code': 'ModelAssetPathNotFoundInStorage', 'message': 'No blobs found in storage at model asset path: azureml/4319dfec-3b63-472d-a27c-656c56197170/model_output/'}], 'message': 'The request is invalid.', 'statusCode': 400, 'code': 'BadRequest'}, 'error': <data_capability._restclient.model.models._models_py3.RootError object at 0x14d8501c5490>, 'correlation': {'operation': '378f68a2a921e81a0b51ce367ed9d501', 'request': 'c0318bc9200d9941', 'RequestId': 'c0318bc9200d9941'}, 'environment': 'westeurope', 'location': 'westeurope', 'time': datetime.datetime(2024, 2, 13, 10, 39, 18, 806659, tzinfo=<FixedOffset '+00:00'>), 'component_name': 'modelregistry'}"
    }
}

Screenshot from 2024-02-13 13-44-07 (second, invisible warning is the same as the error)

The code attached below is run as a command job in an azure ml pipeline.

Tracking information

System information: Linux #61~20.04.1-Ubuntu SMP Tue Nov 21 17:50:57 UTC 2023
Python version: 3.11.7
MLflow version: 2.10.0
MLflow module location: /opt/conda/envs/ptca/lib/python3.11/site-packages/mlflow/__init__.py
Tracking URI: azureml://westeurope.api.azureml.ms/mlflow/v1.0/subscriptions/466c9654-1c8f-4bf5-95ba-c464c64aa485/resourceGroups/Hobbits-AI-Lab/providers/Microsoft.MachineLearningServices/workspaces/mordorml
Registry URI: azureml://westeurope.api.azureml.ms/mlflow/v1.0/subscriptions/466c9654-1c8f-4bf5-95ba-c464c64aa485/resourceGroups/Hobbits-AI-Lab/providers/Microsoft.MachineLearningServices/workspaces/mordorml
Active experiment ID: e6f0e63a-8430-4a04-bc15-25e98d991ca1
Active run ID: 5eb9ffc6-517b-4a25-8e9a-bdf10d50f0fe
Active run artifact URI: azureml://westeurope.api.azureml.ms/mlflow/v2.0/subscriptions/466c9654-1c8f-4bf5-95ba-c464c64aa485/resourceGroups/Hobbits-AI-Lab/providers/Microsoft.MachineLearningServices/workspaces/mordorml/experiments/e6f0e63a-8430-4a04-bc15-25e98d991ca1/runs/5eb9ffc6-517b-4a25-8e9a-bdf10d50f0fe/artifacts
MLflow environment variables: 
  MLFLOW_DISABLE_ENV_MANAGER_CONDA_WARNING: True
  MLFLOW_EXPERIMENT_ID: e6f0e63a-8430-4a04-bc15-25e98d991ca1
  MLFLOW_EXPERIMENT_NAME: train_fa_predictor_pipeline
  MLFLOW_TRACKING_TOKEN: eyJhbGciOiJSUzI1NiIsImtpZCI6IjA3RTU0ODI2RjE1ODI4N0M0OUU5QjlGMDZFMkM5RDYyNUM2Q0MyOTIiLCJ0eXAiOiJKV1QifQ.eyJyb2xlIjoiQ29udHJpYnV0b3IiLCJzY29wZSI6Ii9zdWJzY3JpcHRpb25zLzQ2NmM5NjU0LTFjOGYtNGJmNS05NWJhLWM0NjRjNjRhYTQ4NS9yZXNvdXJjZUdyb3Vwcy9Ib2JiaXRzLUFJLUxhYi9wcm92aWRlcnMvTWljcm9zb2Z0Lk1hY2hpbmVMZWFybmluZ1NlcnZpY2VzL3dvcmtzcGFjZXMvbW9yZG9ybWwiLCJhY2NvdW50aWQiOiIwMDAwMDAwMC0wMDAwLTAwMDAtMDAwMC0wMDAwMDAwMDAwMDAiLCJ3b3Jrc3BhY2VJZCI6ImI4MTA0ZTNmLTU1YzMtNGE1NS04ZDk1LTEzYmRjNWZiYjVjMSIsInByb2plY3RpZCI6IjAwMDAwMDAwLTAwMDAtMDAwMC0wMDAwLTAwMDAwMDAwMDAwMCIsImRpc2NvdmVyeSI6InVyaTovL2Rpc2NvdmVyeXVyaS8iLCJ0aWQiOiIxYjE2YWIzZS1iOGY2LTRmZTMtOWYzZS0yZGI3ZmU1NDlmNmEiLCJvaWQiOiJmNzMxOGVmOS0xOWMwLTRmMTktYjRlMi04YjUxMDY3MjNjMWMiLCJwdWlkIjoiMTAwMzIwMDMwQ0RFOUMzQyIsImlzcyI6ImF6dXJlbWwiLCJhcHBpZCI6IkpVUkRaSU5TS0kgR3J6ZWdvcnoiLCJleHAiOjE3MDk2NDczNTUsImF1ZCI6ImF6dXJlbWwifQ.OOTVOeIXLLSEqJCr_gQCnfYZMjUE3ConQneaBfn1yzCsmG8ZnGjXSfJvKImLg44eA7jelqgLN9vkTDqMcvhbNskh1xAQYf6OqrhLx-W7gTleasWgeW-NYIxq3s48JD3ylsyk61l6RKq9V-h3QHsb_NWWnycykoWkNwVVLhGWgNL0dzPlcE5_47YhAyUWZYKIVtr5t2ZQC6lcb7FfN4I3cjvnMuQN7LDVUw4gY5wBCND6LWGgGz5k-Ahu9LQI5pRZF67n9KBHK184UzD6sIkfQFcWqjmgPO5BJ2QDDvOWCqDfVsPV6tEBF6gwupupyCfNQuN6wY2LtPORk6AOVVVWDA
  MLFLOW_TRACKING_URI: azureml://westeurope.api.azureml.ms/mlflow/v1.0/subscriptions/466c9654-1c8f-4bf5-95ba-c464c64aa485/resourceGroups/Hobbits-AI-Lab/providers/Microsoft.MachineLearningServices/workspaces/mordorml
MLflow dependencies: 
  Flask: 3.0.2
  Jinja2: 3.1.3
  aiohttp: 3.9.3
  alembic: 1.13.1
  azure-storage-file-datalake: 12.14.0
  click: 8.1.7
  cloudpickle: 3.0.0
  databricks-cli: 0.18.0
  docker: 7.0.0
  entrypoints: 0.4
  fastapi: 0.104.1
  gitpython: 3.1.41
  gunicorn: 21.2.0
  importlib-metadata: 7.0.1
  markdown: 3.3.7
  matplotlib: 3.5.2
  numpy: 1.26.4
  packaging: 23.2
  pandas: 2.1.4
  protobuf: 3.20.2
  pyarrow: 14.0.1
  pydantic: 2.5.2
  pytz: 2023.4
  pyyaml: 6.0
  querystring-parser: 1.2.4
  requests: 2.31.0
  scikit-learn: 1.3.0
  scipy: 1.12.0
  sqlalchemy: 2.0.26
  sqlparse: 0.4.4
  tiktoken: 0.5.2
  uvicorn: 0.22.0
  virtualenv: 20.25.0

Code to reproduce issue

import mlflow.sklearn
import sklearn.preprocessing
import typer
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.pipeline import make_pipeline


def train(
    model_output: str = typer.Argument(
        ...,
        help="Path where to save a model.",
    ),
    model_name: str = typer.Option(
        default="fa_predictor",
        help="Name used to save trained model.",
    ),
) -> None:
    
    # Load data

    scaler = sklearn.preprocessing.StandardScaler()
    model = make_pipeline(
        scaler, GradientBoostingClassifier(loss="log_loss", learning_rate=0.1, n_estimators=100, max_depth=3)
    )

    # Fit model

    mlflow.sklearn.log_model(
        sk_model=model,
        artifact_path=model_output,
        registered_model_name=model_name,
    )


if __name__ == "__main__":
    typer.run(train)

Stack trace

2024/02/13 11:12:45 WARNING mlflow.models.model: Logging model metadata to the tracking server has failed. The model artifacts have been logged successfully under azureml://westeurope.api.azureml.ms/mlflow/v2.0/subscriptions/466c9654-1c8f-4bf5-95ba-c464c64aa485/resourceGroups/Hobbits-AI-Lab/providers/Microsoft.MachineLearningServices/workspaces/mordorml/experiments/e6f0e63a-8430-4a04-bc15-25e98d991ca1/runs/74dcf196-82bc-4674-8e97-90fd57eab2b7/artifacts. Set logging level to DEBUG via `logging.getLogger("mlflow").setLevel(logging.DEBUG)` to see the full traceback.
2024/02/13 11:12:45 DEBUG mlflow.models.model: 
Traceback (most recent call last):
  File "/opt/conda/envs/ptca/lib/python3.11/site-packages/mlflow/models/model.py", line 625, in log
    mlflow.tracking.fluent._record_logged_model(mlflow_model, run_id)
  File "/opt/conda/envs/ptca/lib/python3.11/site-packages/mlflow/tracking/fluent.py", line 1348, in _record_logged_model
    MlflowClient()._record_logged_model(run_id, mlflow_model)
  File "/opt/conda/envs/ptca/lib/python3.11/site-packages/mlflow/tracking/client.py", line 1782, in _record_logged_model
    self._tracking_client._record_logged_model(run_id, mlflow_model)
  File "/opt/conda/envs/ptca/lib/python3.11/site-packages/mlflow/tracking/_tracking_service/client.py", line 494, in _record_logged_model
    self.store.record_logged_model(run_id, mlflow_model)
  File "/opt/conda/envs/ptca/lib/python3.11/site-packages/mlflow/store/tracking/rest_store.py", line 327, in record_logged_model
    self._call_endpoint(LogModel, req_body)
  File "/opt/conda/envs/ptca/lib/python3.11/site-packages/mlflow/store/tracking/rest_store.py", line 59, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/ptca/lib/python3.11/site-packages/mlflow/utils/rest_utils.py", line 220, in call_endpoint
    response = verify_rest_response(response, endpoint)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/ptca/lib/python3.11/site-packages/mlflow/utils/rest_utils.py", line 152, in verify_rest_response
    raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: Response: {'Error': {'Code': 'ValidationError', 'Severity': None, 'Message': 'The request is invalid.', 'MessageFormat': None, 'MessageParameters': None, 'ReferenceCode': None, 'DetailsUri': None, 'Target': None, 'Details': [], 'InnerError': None, 'DebugInfo': None, 'AdditionalInfo': None}, 'Correlation': {'operation': 'fca73f4adab629c35202e3e02505e070', 'request': '08a2656c522829df'}, 'Environment': 'westeurope', 'Location': 'westeurope', 'Time': '2024-02-13T11:12:45.869904+00:00', 'ComponentName': 'mlflow', 'statusCode': 400, 'error_code': 'INVALID_PARAMETER_VALUE'}
Registered model 'fa_predictor' already exists. Creating a new version of this model...

Other info / logs

No response

What component(s) does this bug affect?

  • [X] area/artifacts: Artifact stores and artifact logging
  • [ ] area/build: Build and test infrastructure for MLflow
  • [ ] area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • [ ] area/docs: MLflow documentation pages
  • [ ] area/examples: Example code
  • [X] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • [ ] area/models: MLmodel format, model serialization/deserialization, flavors
  • [ ] area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • [ ] area/projects: MLproject format, project running backends
  • [ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • [ ] area/server-infra: MLflow Tracking server backend
  • [ ] area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • [ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • [ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • [ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • [ ] area/windows: Windows support

What language(s) does this bug affect?

  • [ ] language/r: R APIs and clients
  • [ ] language/java: Java APIs and clients
  • [ ] language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • [X] integrations/azure: Azure and Azure ML integrations
  • [ ] integrations/sagemaker: SageMaker integrations
  • [ ] integrations/databricks: Databricks integrations

gjurdzinski-deepsense avatar Feb 13 '24 13:02 gjurdzinski-deepsense

@akshaya-a @santiagxf would you mind taking a look here? Thank you! :)

BenWilson2 avatar Feb 14 '24 01:02 BenWilson2

Sure we are happy to take a look. @gjurdzinski-deepsense can you tell us what's the value you are passing on model_output variable?

santiagxf avatar Feb 14 '24 04:02 santiagxf

Thanks! When creating the job I'm passing Output(type=AssetTypes.CUSTOM_MODEL). When the job is running, the value of model_output variable is just "model_output".

gjurdzinski-deepsense avatar Feb 14 '24 07:02 gjurdzinski-deepsense

Let me see if I can repro and will get back to you.

santiagxf avatar Feb 14 '24 14:02 santiagxf

@gjurdzinski-deepsense I don't see azureml-mlflow in the list of packages installed. Can you share that you installed AzureML MLflow plugin?

santiagxf avatar Feb 14 '24 19:02 santiagxf

@santiagxf I checked with pip list, it's installed:

Package                     Version
--------------------------- -----------
...
azureml-mlflow              1.55.0
...

For the context of the whole installation. I base my docker image on mcr.microsoft.com/azureml/curated/acpt-pytorch-2.0-cuda11.7:21, which comes with python 3.8 and conda. I use python 3.11.7 and poetry in my project, so my dockerfile looks like this:

FROM mcr.microsoft.com/azureml/curated/acpt-pytorch-2.0-cuda11.7:21
...
RUN conda install -y python=3.11.7
...
COPY poetry.lock pyproject.toml ./
RUN poetry install
...

gjurdzinski-deepsense avatar Feb 15 '24 08:02 gjurdzinski-deepsense

Thanks for the reply. Unfortunately, I couldn't reproduce the issue on my end. Can you please share the environment definition, code, and way you are generating the job so we can have a look? Alternatively, I'm sharing here an example very similar to what yo are doing. Can you validate if you are doing something different?

The job definition is as follows:

job.yml

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
experiment_name: mlflow-log-model
environment:
    image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest
    conda_file: conda.yml
code: train.py
command: pyrunit train.py train_model --input-data ${{inputs.input_data}} --model-path ${{inputs.model_path}} --registered-model-name ${{inputs.registered_model_name}}
inputs:
    model_path: model
    registered_model_name: heart-classifier-pipeline
    input_data:
        type: uri_file
        path: https://azuremlexampledata.blob.core.windows.net/data/heart-disease-uci/data/heart.csv
resources:
    instance_count: 1

The environment libraries are as follows:

conda.yml

channels:
- conda-forge
dependencies:
- python=3.11.7
- pip
- pip:
  - mlflow
  - azureml-mlflow
  - datasets
  - jobtools
  - cloudpickle==3.0.0
  - scikit-learn==1.4.0
  - scipy==1.12.0
  - xgboost==2.0.3
name: mlflow-env

The training code is as follows:

train.py

# %%
import mlflow
import pandas as pd
from mlflow.models import infer_signature
from sklearn.preprocessing import OrdinalEncoder
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, recall_score
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from xgboost import XGBClassifier

# %%
def train_model(input_data: str, model_path: str, registered_model_name: str = None):
    with mlflow.start_run():
        mlflow.xgboost.autolog(log_models=False)

        df = pd.read_csv(input_data)

        X_train, X_test, y_train, y_test = train_test_split(
            df.drop("target", axis=1), df["target"], test_size=0.3
        )
        
        encoder = ColumnTransformer(
            [
                (
                    "cat_encoding",
                    OrdinalEncoder(
                        categories="auto",
                        handle_unknown='use_encoded_value', 
                        unknown_value=-1,
                        encoded_missing_value=-1,
                    ),
                    ["thal"],
                )
            ],
            remainder="passthrough",
            verbose_feature_names_out=False,
        )
        model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")

        pipeline = Pipeline(steps=[("encoding", encoder), ("model", model)])
        pipeline.fit(X_train, y_train)

        y_pred = pipeline.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)
        recall = recall_score(y_test, y_pred)

        mlflow.log_metric("test_accuracy", accuracy)
        mlflow.log_metric("test_recall", recall)

        signature = infer_signature(X_test, y_test)
        mlflow.sklearn.log_model(pipeline, 
                                 artifact_path=model_path, 
                                 signature=signature, 
                                 registered_model_name=registered_model_name)

You can run this example with:

az ml job create -f job.yml

The example files are in the following zip for your convenience: job.zip

santiagxf avatar Feb 15 '24 19:02 santiagxf

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

github-actions[bot] avatar Feb 21 '24 00:02 github-actions[bot]

I'll try to narrow down our code to the pieces I can share but I can already point out few things:

  1. We don't define the job with an .yml file. We do it more less like this (I'll prepare a working example later):
# Define local variables ml_client, compute_cluster, ...

def prepare_train_predictor_component(
    code_path: Path,
    ml_client: MLClient,
    compute_cluster: AmlCompute,
    environment: Environment,
    model_name: str = "predictor",
) -> Component:
    script_name = "train_predictor.py"

    train_component = command(
        name="train_predictor",
        display_name="Train Predictor",
        description="Trains Predictor",
        inputs={
            "dataset_path": Input(type="uri_folder", mode="ro_mount"),
            "train_csv_filename": Input(type="string", default="train.csv"),
            "test_csv_filename": Input(type="string", default="test.csv"),
            "model_name": Input(type="string", default=model_name),
        },
        outputs={
            "model_output": Output(type=AssetTypes.CUSTOM_MODEL),
        },
        code=code_path,
        command=" ".join(
            [
                f"python {script_name}",
                "${{inputs.dataset_path}}/${{inputs.train_csv_filename}}",
                "${{inputs.dataset_path}}/${{inputs.test_csv_filename}}",
                "${{outputs.model_output}}",
                "--model-name ${{inputs.model_name}}",
            ]
        ),
        environment=f"{environment.name}:{environment.version}",
        compute=compute_cluster.name,
    )

    component: Component = ml_client.create_or_update(train_component.component)

    return component


train_component = prepare_train_predictor_component(
    ml_client,
    compute_cluster,
    environment,
    model_name,
)


@dsl.pipeline(
    compute=compute_cluster.name,
    description="Training Pipeline",
)
def train_predictor_pipeline(
    dataset_path: Input,
    train_csv_filename: Input,
    test_csv_filename: Input,
    model_type: str,
    model_name: str,
) -> PipelineJob:
    """Defines Azure
    train_job = train_component(
        dataset_path=dataset_path,
        train_csv_filename=train_csv_filename,
        test_csv_filename=test_csv_filename,
        model_type=model_type,
        model_name=model_name,
    )

    train_job.outputs.model_output = Output(type=AssetTypes.CUSTOM_MODEL)

    return {}


pipeline = train_predictor_pipeline(
    dataset_path=Input(
        type="uri_folder",
        path=os.path.join(azureml_storage_path, dataset_path),
        mode="ro_mount",
    ),
    train_csv_filename=train_csv_filename,
    test_csv_filename=test_csv_filename,
    model_name=model_name,
)

pipeline_job = ml_client.jobs.create_or_update(pipeline, experiment_name="train_predictor_pipeline")
ml_client.jobs.stream(pipeline_job.name)
  1. I added with mlflow.start_run() to the training script but it fails with:
UnsupportedModelRegistryStoreURIException:  Model registry functionality is  unavailable; got unsupported URI 'azureml://westeurope.api.azureml.ms/mlflow/v1.0/subscriptions/<redacted>/resourceGroups/<redacted>/providers/Microsoft.MachineLearningServices/workspaces/<redacted>' for model registry data storage. Supported URI schemes are: ['', 'file', 'databricks', 'databricks-uc', 'http', 'https', 'postgresql', 'mysql', 'sqlite', 'mssql']. See https://www.mlflow.org/docs/latest/tracking.html#storage for how to run an  MLflow server against one of the supported backend storage locations.

gjurdzinski-deepsense avatar Feb 22 '24 09:02 gjurdzinski-deepsense

Thanks for sharing! Using the Python SDK to create pipelines and jobs (looks you are using Azure ML pipelines) is completely supported. Based on the later error message I think we can know what's going on. The azure-mlflow plugin is not correctly installed in your environment. You see that the protocol azureml is not being recognized. I suggest to review the environment that is being used on each of the steps of the pipeline and make sure the environment has the right dependencies.

santiagxf avatar Feb 22 '24 13:02 santiagxf

I finally fixed it and turns out the problem was completely elsewhere. The job was ok, the pipeline was failing – the pipeline was setting the job output to be Output(type=AssetTypes.CUSTOM_MODEL) and the asset creation was failing. That's why I could see the model in model registry (job succeeded saving it there), the failure happened later.

Thanks for your time and support!

gjurdzinski-deepsense avatar Feb 27 '24 16:02 gjurdzinski-deepsense

Glad you solved the problem. I still think there was an error somewhere else because your logs were showing a very clear message. We are always looking for ways to help users find errors easier. If you can share the complete example we can take a look. Thanks!

santiagxf avatar Feb 27 '24 17:02 santiagxf

The error was not on the mlflow side. Asset creation in Azure ML was failing when I was defining Output of type CUSTOM_MODEL.

gjurdzinski-deepsense avatar Mar 13 '24 08:03 gjurdzinski-deepsense

I finally fixed it and turns out the problem was completely elsewhere. The job was ok, the pipeline was failing – the pipeline was setting the job output to be Output(type=AssetTypes.CUSTOM_MODEL) and the asset creation was failing. That's why I could see the model in model registry (job succeeded saving it there), the failure happened later.

Thanks for your time and support!

I have the same issue, trying to connect a few jobs in a pipeline. My output of job No. 1 is an MLFLOW type model which should go to the input of job No. 2. Could you elaborate on what exactly was the problem in your case and how did you fix it? Was it just a change from CUSTOM_MODEL to MLFLOW_MODEL or there was some other issue? Thanks in advance.

alex-bronia avatar Apr 17 '24 10:04 alex-bronia