feat: Added Model Metadata support in Registry
What this PR does / why we need it:
Store model-related metadata (e.g., model name, features used, project name, timestamp) inside the registry, alongside other Feast objects (like Feature-views, Features, etc.).
This will allow users to link features to models and visualize the relationships between them, which will enable user to answer questions like -
- Which features were used to train this model?
- If I change this feature, which models will be affected?
- When a model was trained?
In [1]: from feast import FeatureStore
In [2]: from feast.model import ModelMetadata
In [3]: fs = FeatureStore("/feast/feature_repo")
In [4]: model = ModelMetadata(name="fraud_detection_v1", project="fraud_detection", tags={"team": "data_scientist"})
In [5]: fs.apply(model)
In [6]: fs.list_models()
Out[6]:
[ModelMetadata(
name='fraud_detection_v1',
project='fraud_detection',
feature_view=[],
feature_service=[],
features=[],
tags={'team': 'data_scientist'},
training_timestamp=None,
description='',
)]
Misc
Next followup steps:
- Add unit/integration tests
- Add documentation
- Implement remote registry support
- Process features/Feature-views/Feature-services based on Model Metadata.
- Add a “Models” tab in Feast UI
- Add lineage showing features-model relations
- Add grpc and rest registry endpoints
- Add CLI for models
Let me suggest a different approach if I may.. If I understand correctly, this would essentially be part of feast only to enable proper model-feature lineage, right? I think introducing a separate proto object (with all that it entails, rbac and so on) might be an overkill for a non-essential bit of information. Also in my experience it's highly likely users will very often neglect documenting models in feast, especially when this much effort is necessary.
Can't we instead go for a lighter integration by simply introducing a field (a string or maybe something a bit more complicated) in FeatureService that would act as a link to whatever model registry users use? FeatureService is supposed to have a one-to-one relationship with models as-is anyway. Even if for some reason model queries feast w/o a FeatureService, creating a dummy FeatureService just for better lineage would essentially be equivalent in terms of effort required to the ModelMetadata approach.
@tokoko what do you think about MLFlow here?
Do we have to make a choice? I would either go with an open-text string that user is free to fill however it likes or with a oneof with MlFlowModel and ModelRegistryModel as possible options. with mlflow tracking server url and model name (maybe model version as well (?)) should probably be enough, not familiar with model registry but probably something similar there as well.
Can't we instead go for a lighter integration by simply introducing a field (a string or maybe something a bit more complicated) in
FeatureServicethat would act as a link to whatever model registry users use?FeatureServiceis supposed to have a one-to-one relationship with models as-is anyway. Even if for some reason model queries feast w/o aFeatureService, creating a dummy FeatureService just for better lineage would essentially be equivalent in terms of effort required to theModelMetadataapproach.
I see your point, adding a simple reference field to FeatureService is a practical solution that solves the immediate need with minimal implementation and maintenance overhead. For many users, especially those not deeply invested in formal model management infra, this lightweight feature service linkage might be more than sufficient. But, this approach might not scale well for users who have large teams and needs tighter integrations with model training pipelines or registries.
I had a thought of FeatureService is fundamentally a construct for combining and serving a set of features, often composed from multiple FeatureViews, and can be reused across multiple models. Users might tweak a few features between models, or reuse the same FeatureService across different experiments. Thus, feature service might look like a workaround as the primary location to store model-specific metadata.
Also in my experience it's highly likely users will very often neglect documenting models in feast, especially when this much effort is necessary.
That’s a fair concern. But one of the key advantages of having a structured ModelMetadata proto is that it opens the door for automation. Metadata could be auto-populated as part of model training or deployment pipelines, It will also allow users to reference specific training runs.
Having a dedicated ModelMetadata, even if lightweight or oneof with MlFlowModel and ModelRegistryModel, gives us better flexibility. Thoughts ?
we should discuss with @tarilabs @HumairAK and @szaher
we should discuss with @tarilabs @HumairAK and @szaher
thank you for tagging me, if it's of any help here is an entry point for KF MR references, with the caveat we're effectively getting away from Google MLMD dependency (getting away transparently for the user).