yocto-gl Support FileStore for Model Registry

Signed-off-by: Xinyue Ruan [email protected]

Related Issues/PRs

#6485

What changes are proposed in this pull request?

Add FileStore support for model registry

How is this patch tested?

[ ] I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works. Will add tests soon.

Does this PR change the documentation?

[ ] No. You can skip the rest of this section.
[x] Yes. Make sure the changed pages / sections render correctly by following the steps below.

Click the Details link on the Preview docs check.
Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

[ ] No. You can skip the rest of this section.
[x] Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

[ ] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[ ] area/docs: MLflow documentation pages
[ ] area/examples: Example code
[x] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[ ] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
[ ] area/projects: MLproject format, project running backends
[ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
[ ] area/server-infra: MLflow Tracking server backend
[ ] area/tracking: Tracking Service, tracking client APIs, autologging

Interface

[ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

Language

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

Integrations

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

[ ] rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
[ ] rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
[x] rn/feature - A new user-facing feature worth mentioning in the release notes
[ ] rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
[ ] rn/documentation - A user-facing documentation change worth mentioning in the release notes

Aug 26 '22 10:08 serena-ruan

Hi @dbczumar, this is still a draft PR without unit tests, but I'll add them in following commits. There're some details I might need some guidance/clarification 😃

What's the default behavior of saving model versions under a registered model? My current implementation is like creating subfolder named "version-{version_numer}" for different versions under the registered model folder, is this expected?
Method transition_model_version_stage contains a param named stage but the docstring shows as new_stage so we might want consistency here. Though I prefer new_stage, seems base and other storages are all using stage.
When we update a model_version stage, do we also update the last_update_timestamp of the parent registered_model? And a general question would be that should we update the parent registered_model whenever there's an action happening for model_versions within it?
Is it possible if the same tag (key) for registered_model/model_version set multiple times? If so should we raise an error or we override previous value?
Do we support restore of registered models? For now seems not, but any plans in the future?

Pls let me know if you have other concerns or suggestions. Thanks!

Aug 26 '22 10:08 serena-ruan

What's the default behavior of saving model versions under a registered model? My current implementation is like creating subfolder named "version-{version_numer}" for different versions under the registered model folder, is this expected?

Sounds good

Method transition_model_version_stage contains a param named stage but the docstring shows as new_stage so we might want consistency here. Though I prefer new_stage, seems base and other storages are all using stage.

Doc error. We should modify doc param name to be stage

When we update a model_version stage, do we also update the last_update_timestamp of the parent registered_model? And a general question would be that should we update the parent registered_model whenever there's an action happening for model_versions within it?

In sqlalchemy backend, some operations update registered_model.last_update_timestamp (e.g. transition_model_version_stage), but some others not (e.g. update_model_version). @dbczumar What's the target behavior do you expect ? All related operations update registered_model.last_update_timestamp ?

Is it possible if the same tag (key) for registered_model/model_version set multiple times? If so should we raise an error or we override previous value?

We should override previous value, You can try sqlalchemy backend to check its behavior.

Do we support restore of registered models? For now seems not, but any plans in the future?

Do you mean restore registered model after it being deleted ? Is it an important in some scenarios ?

Aug 26 '22 13:08 WeichenXu123

What's the default behavior of saving model versions under a registered model? My current implementation is like creating subfolder named "version-{version_numer}" for different versions under the registered model folder, is this expected?

Sounds good

Method transition_model_version_stage contains a param named stage but the docstring shows as new_stage so we might want consistency here. Though I prefer new_stage, seems base and other storages are all using stage.

Doc error. We should modify doc param name to be stage

When we update a model_version stage, do we also update the last_update_timestamp of the parent registered_model? And a general question would be that should we update the parent registered_model whenever there's an action happening for model_versions within it?

In sqlalchemy backend, some operations update registered_model.last_update_timestamp (e.g. transition_model_version_stage), but some others not (e.g. update_model_version). @dbczumar What's the target behavior do you expect ? All related operations update registered_model.last_update_timestamp ?

Is it possible if the same tag (key) for registered_model/model_version set multiple times? If so should we raise an error or we override previous value?

We should override previous value, You can try sqlalchemy backend to check its behavior.

Do we support restore of registered models? For now seems not, but any plans in the future?

Do you mean restore registered model after it being deleted ? Is it an important in some scenarios ?

Thanks! I guess for the last one it's just out of curiosity. Because we support restore of deleting experiments :) No special scenarios

Aug 26 '22 13:08 serena-ruan

In sqlalchemy backend, some operations update registered_model.last_update_timestamp (e.g. transition_model_version_stage), but some others not (e.g. update_model_version). @dbczumar What's the target behavior do you expect ? All related operations update registered_model.last_update_timestamp ?

I think that all operations related to a registered model should update the timestamp.

Aug 28 '22 19:08 dbczumar

I think that all operations related to a registered model should update the timestamp.

Then sqlalchemy backend need some update to satisfy this demand. I filed a ticket https://github.com/mlflow/mlflow/issues/6628

Aug 29 '22 09:08 WeichenXu123

Thanks for the review!! I'll take a look at those comments while adding unit tests for all functions, and will ping you once it's ready for review 😃

Aug 31 '22 02:08 serena-ruan

@serena-ruan Could you add unit tests ? :)

Aug 31 '22 12:08 WeichenXu123

@serena-ruan Could you add unit tests ? :)

Just added a bunch of tests and will refactor soon (It's too long and might contain lots of duplication logics there)

Aug 31 '22 13:08 serena-ruan

@dbczumar Hey I updated some unit tests, could you have another look when you got time? Thanks :D

Sep 05 '22 14:09 serena-ruan

Documentation preview for a66e71b8f721ba1670a93c0c43f9d0254ee633fd will be available here when this CircleCI job completes successfully.

More info

Ignore this comment if this PR does not change the documentation.
It takes a few minutes for the preview to be available.
The preview is updated when a new commit is pushed to this PR.
This comment was created by https://github.com/mlflow/mlflow/actions/runs/3459807498.

Oct 10 '22 16:10 mlflow-automation

@WeichenXu123 @dbczumar Hey could you help review this again? 😃

Oct 12 '22 02:10 serena-ruan

Hi @serena-ruan , awesome progress! The code looks great!

I tried out the following workflow and ran into a search error:

import mlflow

class MyMod(mlflow.pyfunc.PythonModel):
    def predict(self, ctx, inp):
        return 7


mlflow.pyfunc.log_model(python_model=MyMod(), artifact_path="foo", registered_model_name="model1")
mlflow.pyfunc.log_model(python_model=MyMod(), artifact_path="foo", registered_model_name="model2")

print(mlflow.MlflowClient().search_registered_models())
print(mlflow.MlflowClient().search_model_versions("name = 'model1'"))
print(mlflow.MlflowClient().search_model_versions("name = 'model2'"))

Successfully registered model 'model1'.
2022/10/20 17:37:09 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: model1, version 1
Created version '1' of model 'model1'.
Successfully registered model 'model2'.
2022/10/20 17:37:11 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: model2, version 1
Created version '1' of model 'model2'.
Traceback (most recent call last):
  File "log.py", line 11, in <module>
    print(mlflow.MlflowClient().search_registered_models())
  File "/Users/corey.zumar/mlflow/mlflow/tracking/client.py", line 2084, in search_registered_models
    return self._get_registry_client().search_registered_models(
  File "/Users/corey.zumar/mlflow/mlflow/tracking/_model_registry/client.py", line 130, in search_registered_models
    return self.store.search_registered_models(filter_string, max_results, order_by, page_token)
  File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 309, in search_registered_models
    registered_models = self._list_all_registered_models()
  File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 274, in _list_all_registered_models
    registered_models.append(self._get_registered_model_from_path(path))
  File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 175, in _get_registered_model_from_path
    meta = FileStore._read_yaml(model_path, FileStore.META_DATA_FILE_NAME)
  File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 769, in _read_yaml
    return _read_helper(root, file_name, attempts_remaining=retries)
  File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 762, in _read_helper
    result = read_yaml(root, file_name)
  File "/Users/corey.zumar/mlflow/mlflow/utils/file_utils.py", line 213, in read_yaml
    raise MissingConfigException("Yaml file '%s' does not exist." % file_path)
mlflow.exceptions.MissingConfigException: Yaml file '/private/tmp/mreg/mlruns/.trash/meta.yaml' does not exist.

I'll continue trying out other workflows in the meantime. Can we address this error and add test coverage for it? Thanks so much!

I just added this testcase in the test file, and it works fine on my side 🤔 But I'll see the pipeline results just for double confirmation.

Oct 21 '22 15:10 serena-ruan

Hi @serena-ruan , awesome progress! The code looks great! I tried out the following workflow and ran into a search error:

import mlflow

class MyMod(mlflow.pyfunc.PythonModel):
    def predict(self, ctx, inp):
        return 7


mlflow.pyfunc.log_model(python_model=MyMod(), artifact_path="foo", registered_model_name="model1")
mlflow.pyfunc.log_model(python_model=MyMod(), artifact_path="foo", registered_model_name="model2")

print(mlflow.MlflowClient().search_registered_models())
print(mlflow.MlflowClient().search_model_versions("name = 'model1'"))
print(mlflow.MlflowClient().search_model_versions("name = 'model2'"))

Successfully registered model 'model1'.
2022/10/20 17:37:09 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: model1, version 1
Created version '1' of model 'model1'.
Successfully registered model 'model2'.
2022/10/20 17:37:11 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: model2, version 1
Created version '1' of model 'model2'.
Traceback (most recent call last):
  File "log.py", line 11, in <module>
    print(mlflow.MlflowClient().search_registered_models())
  File "/Users/corey.zumar/mlflow/mlflow/tracking/client.py", line 2084, in search_registered_models
    return self._get_registry_client().search_registered_models(
  File "/Users/corey.zumar/mlflow/mlflow/tracking/_model_registry/client.py", line 130, in search_registered_models
    return self.store.search_registered_models(filter_string, max_results, order_by, page_token)
  File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 309, in search_registered_models
    registered_models = self._list_all_registered_models()
  File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 274, in _list_all_registered_models
    registered_models.append(self._get_registered_model_from_path(path))
  File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 175, in _get_registered_model_from_path
    meta = FileStore._read_yaml(model_path, FileStore.META_DATA_FILE_NAME)
  File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 769, in _read_yaml
    return _read_helper(root, file_name, attempts_remaining=retries)
  File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 762, in _read_helper
    result = read_yaml(root, file_name)
  File "/Users/corey.zumar/mlflow/mlflow/utils/file_utils.py", line 213, in read_yaml
    raise MissingConfigException("Yaml file '%s' does not exist." % file_path)
mlflow.exceptions.MissingConfigException: Yaml file '/private/tmp/mreg/mlruns/.trash/meta.yaml' does not exist.

I'll continue trying out other workflows in the meantime. Can we address this error and add test coverage for it? Thanks so much!

I just added this testcase in the test file, and it works fine on my side 🤔 But I'll see the pipeline results just for double confirmation.

Thanks @serena-ruan ! The issue I'm running into is that _get_all_registered_models() is including the .trash folder in the list of subdirectories. I think the broader problem is that we're storing models in the top-level mlruns directory at the same level as experiments and the trash. For example, if I create an experiment and then try to list registered models, I see the experiment in the list of registered model paths.

To test, I ran the following script:

import mlflow
from mlflow import MlflowClient
from mlflow.pyfunc import PythonModel

class MyModel(PythonModel):
    def predict(self, context, model_input):
        return 7

with mlflow.start_run():
    mlflow.pyfunc.log_model(
        python_model=MyModel(), artifact_path="foo", registered_model_name="model1"
    )

with mlflow.start_run():
    mlflow.log_param("A", "B")

from mlflow.store.model_registry.file_store import FileStore

fs = FileStore("mlruns")
print(fs._get_all_registered_model_paths())

This produces the following output:

Registered model 'model1' already exists. Creating a new version of this model...
2022/10/25 22:00:56 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: model1, version 3
Created version '3' of model 'model1'.
['mlruns/0', 'mlruns/model1', 'mlruns/.trash', 'mlruns/model2']

mlruns/0 corresponds to the default experiment with ID 0.

To solve this problem, I think we need to make a reserved folder called models inside mlruns. This should be safe since existing experiments in the directory use integer-like string IDs, so there shouldn't be any existing top-level folders called models.

Oct 26 '22 05:10 dbczumar

@dbczumar Hey Corey, hope I fixed the issue this time 😅

Nov 08 '22 08:11 serena-ruan

Hey @dbczumar do we have other concerns? And there're some enhancements you or Weichen have raised in this PR that I'm willing to add in follow-up PRs :)

Nov 17 '22 06:11 serena-ruan

yocto-gl yocto-gl copied to clipboard

Support FileStore for Model Registry

Related Issues/PRs

What changes are proposed in this pull request?

How is this patch tested?

Does this PR change the documentation?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

yocto-gl
yocto-gl copied to clipboard