yocto-gl
yocto-gl copied to clipboard
Support FileStore for Model Registry
Signed-off-by: Xinyue Ruan [email protected]
Related Issues/PRs
#6485
What changes are proposed in this pull request?
Add FileStore support for model registry
How is this patch tested?
- [ ] I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works. Will add tests soon.
Does this PR change the documentation?
- [ ] No. You can skip the rest of this section.
- [x] Yes. Make sure the changed pages / sections render correctly by following the steps below.
- Click the
Details
link on thePreview docs
check. - Find the changed pages / sections and make sure they render correctly.
Release Notes
Is this a user-facing change?
- [ ] No. You can skip the rest of this section.
- [x] Yes. Give a description of this change to be included in the release notes for MLflow users.
(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
What component(s), interfaces, languages, and integrations does this PR affect?
Components
- [ ]
area/artifacts
: Artifact stores and artifact logging - [ ]
area/build
: Build and test infrastructure for MLflow - [ ]
area/docs
: MLflow documentation pages - [ ]
area/examples
: Example code - [x]
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry - [ ]
area/models
: MLmodel format, model serialization/deserialization, flavors - [ ]
area/pipelines
: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates - [ ]
area/projects
: MLproject format, project running backends - [ ]
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs - [ ]
area/server-infra
: MLflow Tracking server backend - [ ]
area/tracking
: Tracking Service, tracking client APIs, autologging
Interface
- [ ]
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server - [ ]
area/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Models - [ ]
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry - [ ]
area/windows
: Windows support
Language
- [ ]
language/r
: R APIs and clients - [ ]
language/java
: Java APIs and clients - [ ]
language/new
: Proposals for new client languages
Integrations
- [ ]
integrations/azure
: Azure and Azure ML integrations - [ ]
integrations/sagemaker
: SageMaker integrations - [ ]
integrations/databricks
: Databricks integrations
How should the PR be classified in the release notes? Choose one:
- [ ]
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" section - [ ]
rn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section - [x]
rn/feature
- A new user-facing feature worth mentioning in the release notes - [ ]
rn/bug-fix
- A user-facing bug fix worth mentioning in the release notes - [ ]
rn/documentation
- A user-facing documentation change worth mentioning in the release notes
Hi @dbczumar, this is still a draft PR without unit tests, but I'll add them in following commits. There're some details I might need some guidance/clarification 😃
- What's the default behavior of saving model versions under a registered model? My current implementation is like creating subfolder named "version-{version_numer}" for different versions under the registered model folder, is this expected?
- Method
transition_model_version_stage
contains a param namedstage
but the docstring shows asnew_stage
so we might want consistency here. Though I prefer new_stage, seems base and other storages are all using stage. - When we update a model_version stage, do we also update the last_update_timestamp of the parent registered_model? And a general question would be that should we update the parent registered_model whenever there's an action happening for model_versions within it?
- Is it possible if the same tag (key) for registered_model/model_version set multiple times? If so should we raise an error or we override previous value?
- Do we support restore of registered models? For now seems not, but any plans in the future?
Pls let me know if you have other concerns or suggestions. Thanks!
What's the default behavior of saving model versions under a registered model? My current implementation is like creating subfolder named "version-{version_numer}" for different versions under the registered model folder, is this expected?
Sounds good
Method transition_model_version_stage contains a param named stage but the docstring shows as new_stage so we might want consistency here. Though I prefer new_stage, seems base and other storages are all using stage.
Doc error. We should modify doc param name to be stage
When we update a model_version stage, do we also update the last_update_timestamp of the parent registered_model? And a general question would be that should we update the parent registered_model whenever there's an action happening for model_versions within it?
In sqlalchemy backend, some operations update registered_model.last_update_timestamp (e.g. transition_model_version_stage), but some others not (e.g. update_model_version). @dbczumar What's the target behavior do you expect ? All related operations update registered_model.last_update_timestamp ?
Is it possible if the same tag (key) for registered_model/model_version set multiple times? If so should we raise an error or we override previous value?
We should override previous value, You can try sqlalchemy backend to check its behavior.
Do we support restore of registered models? For now seems not, but any plans in the future?
Do you mean restore registered model after it being deleted ? Is it an important in some scenarios ?
What's the default behavior of saving model versions under a registered model? My current implementation is like creating subfolder named "version-{version_numer}" for different versions under the registered model folder, is this expected?
Sounds good
Method transition_model_version_stage contains a param named stage but the docstring shows as new_stage so we might want consistency here. Though I prefer new_stage, seems base and other storages are all using stage.
Doc error. We should modify doc param name to be stage
When we update a model_version stage, do we also update the last_update_timestamp of the parent registered_model? And a general question would be that should we update the parent registered_model whenever there's an action happening for model_versions within it?
In sqlalchemy backend, some operations update registered_model.last_update_timestamp (e.g. transition_model_version_stage), but some others not (e.g. update_model_version). @dbczumar What's the target behavior do you expect ? All related operations update registered_model.last_update_timestamp ?
Is it possible if the same tag (key) for registered_model/model_version set multiple times? If so should we raise an error or we override previous value?
We should override previous value, You can try sqlalchemy backend to check its behavior.
Do we support restore of registered models? For now seems not, but any plans in the future?
Do you mean restore registered model after it being deleted ? Is it an important in some scenarios ?
Thanks! I guess for the last one it's just out of curiosity. Because we support restore of deleting experiments :) No special scenarios
In sqlalchemy backend, some operations update registered_model.last_update_timestamp (e.g. transition_model_version_stage), but some others not (e.g. update_model_version). @dbczumar What's the target behavior do you expect ? All related operations update registered_model.last_update_timestamp ?
I think that all operations related to a registered model should update the timestamp.
I think that all operations related to a registered model should update the timestamp.
Then sqlalchemy backend need some update to satisfy this demand. I filed a ticket https://github.com/mlflow/mlflow/issues/6628
Thanks for the review!! I'll take a look at those comments while adding unit tests for all functions, and will ping you once it's ready for review 😃
@serena-ruan Could you add unit tests ? :)
@serena-ruan Could you add unit tests ? :)
Just added a bunch of tests and will refactor soon (It's too long and might contain lots of duplication logics there)
@dbczumar Hey I updated some unit tests, could you have another look when you got time? Thanks :D
Documentation preview for a66e71b8f721ba1670a93c0c43f9d0254ee633fd will be available here when this CircleCI job completes successfully.
More info
- Ignore this comment if this PR does not change the documentation.
- It takes a few minutes for the preview to be available.
- The preview is updated when a new commit is pushed to this PR.
- This comment was created by https://github.com/mlflow/mlflow/actions/runs/3459807498.
@WeichenXu123 @dbczumar Hey could you help review this again? 😃
Hi @serena-ruan , awesome progress! The code looks great!
I tried out the following workflow and ran into a search error:
import mlflow class MyMod(mlflow.pyfunc.PythonModel): def predict(self, ctx, inp): return 7 mlflow.pyfunc.log_model(python_model=MyMod(), artifact_path="foo", registered_model_name="model1") mlflow.pyfunc.log_model(python_model=MyMod(), artifact_path="foo", registered_model_name="model2") print(mlflow.MlflowClient().search_registered_models()) print(mlflow.MlflowClient().search_model_versions("name = 'model1'")) print(mlflow.MlflowClient().search_model_versions("name = 'model2'"))
Successfully registered model 'model1'. 2022/10/20 17:37:09 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: model1, version 1 Created version '1' of model 'model1'. Successfully registered model 'model2'. 2022/10/20 17:37:11 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: model2, version 1 Created version '1' of model 'model2'. Traceback (most recent call last): File "log.py", line 11, in <module> print(mlflow.MlflowClient().search_registered_models()) File "/Users/corey.zumar/mlflow/mlflow/tracking/client.py", line 2084, in search_registered_models return self._get_registry_client().search_registered_models( File "/Users/corey.zumar/mlflow/mlflow/tracking/_model_registry/client.py", line 130, in search_registered_models return self.store.search_registered_models(filter_string, max_results, order_by, page_token) File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 309, in search_registered_models registered_models = self._list_all_registered_models() File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 274, in _list_all_registered_models registered_models.append(self._get_registered_model_from_path(path)) File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 175, in _get_registered_model_from_path meta = FileStore._read_yaml(model_path, FileStore.META_DATA_FILE_NAME) File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 769, in _read_yaml return _read_helper(root, file_name, attempts_remaining=retries) File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 762, in _read_helper result = read_yaml(root, file_name) File "/Users/corey.zumar/mlflow/mlflow/utils/file_utils.py", line 213, in read_yaml raise MissingConfigException("Yaml file '%s' does not exist." % file_path) mlflow.exceptions.MissingConfigException: Yaml file '/private/tmp/mreg/mlruns/.trash/meta.yaml' does not exist.
I'll continue trying out other workflows in the meantime. Can we address this error and add test coverage for it? Thanks so much!
I just added this testcase in the test file, and it works fine on my side 🤔 But I'll see the pipeline results just for double confirmation.
Hi @serena-ruan , awesome progress! The code looks great! I tried out the following workflow and ran into a search error:
import mlflow class MyMod(mlflow.pyfunc.PythonModel): def predict(self, ctx, inp): return 7 mlflow.pyfunc.log_model(python_model=MyMod(), artifact_path="foo", registered_model_name="model1") mlflow.pyfunc.log_model(python_model=MyMod(), artifact_path="foo", registered_model_name="model2") print(mlflow.MlflowClient().search_registered_models()) print(mlflow.MlflowClient().search_model_versions("name = 'model1'")) print(mlflow.MlflowClient().search_model_versions("name = 'model2'"))
Successfully registered model 'model1'. 2022/10/20 17:37:09 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: model1, version 1 Created version '1' of model 'model1'. Successfully registered model 'model2'. 2022/10/20 17:37:11 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: model2, version 1 Created version '1' of model 'model2'. Traceback (most recent call last): File "log.py", line 11, in <module> print(mlflow.MlflowClient().search_registered_models()) File "/Users/corey.zumar/mlflow/mlflow/tracking/client.py", line 2084, in search_registered_models return self._get_registry_client().search_registered_models( File "/Users/corey.zumar/mlflow/mlflow/tracking/_model_registry/client.py", line 130, in search_registered_models return self.store.search_registered_models(filter_string, max_results, order_by, page_token) File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 309, in search_registered_models registered_models = self._list_all_registered_models() File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 274, in _list_all_registered_models registered_models.append(self._get_registered_model_from_path(path)) File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 175, in _get_registered_model_from_path meta = FileStore._read_yaml(model_path, FileStore.META_DATA_FILE_NAME) File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 769, in _read_yaml return _read_helper(root, file_name, attempts_remaining=retries) File "/Users/corey.zumar/mlflow/mlflow/store/model_registry/file_store.py", line 762, in _read_helper result = read_yaml(root, file_name) File "/Users/corey.zumar/mlflow/mlflow/utils/file_utils.py", line 213, in read_yaml raise MissingConfigException("Yaml file '%s' does not exist." % file_path) mlflow.exceptions.MissingConfigException: Yaml file '/private/tmp/mreg/mlruns/.trash/meta.yaml' does not exist.
I'll continue trying out other workflows in the meantime. Can we address this error and add test coverage for it? Thanks so much!
I just added this testcase in the test file, and it works fine on my side 🤔 But I'll see the pipeline results just for double confirmation.
Thanks @serena-ruan ! The issue I'm running into is that _get_all_registered_models()
is including the .trash
folder in the list of subdirectories. I think the broader problem is that we're storing models in the top-level mlruns
directory at the same level as experiments and the trash. For example, if I create an experiment and then try to list registered models, I see the experiment in the list of registered model paths.
To test, I ran the following script:
import mlflow
from mlflow import MlflowClient
from mlflow.pyfunc import PythonModel
class MyModel(PythonModel):
def predict(self, context, model_input):
return 7
with mlflow.start_run():
mlflow.pyfunc.log_model(
python_model=MyModel(), artifact_path="foo", registered_model_name="model1"
)
with mlflow.start_run():
mlflow.log_param("A", "B")
from mlflow.store.model_registry.file_store import FileStore
fs = FileStore("mlruns")
print(fs._get_all_registered_model_paths())
This produces the following output:
Registered model 'model1' already exists. Creating a new version of this model...
2022/10/25 22:00:56 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: model1, version 3
Created version '3' of model 'model1'.
['mlruns/0', 'mlruns/model1', 'mlruns/.trash', 'mlruns/model2']
mlruns/0
corresponds to the default experiment with ID 0.
To solve this problem, I think we need to make a reserved folder called models
inside mlruns
. This should be safe since existing experiments in the directory use integer-like string IDs, so there shouldn't be any existing top-level folders called models
.
@dbczumar Hey Corey, hope I fixed the issue this time 😅
Hey @dbczumar do we have other concerns? And there're some enhancements you or Weichen have raised in this PR that I'm willing to add in follow-up PRs :)