Feature retrieval latency scales with number of feature views, doesn't appear to cache
So I'd like to preface this by saying that I'm new to feast; furthermore I am more of a user of this as part of a subsystem of a larger ecosystem that is in place at my job, and do not interact with the library directly, only an API that sits between it and me. So hopefully my understanding of how it works is correct, but if not, I would love to understand more of how this system works!
Expected Behavior
So my understanding of how "projects" and "feature views" work is kind of analogous to the relationship between a schema and table in a database. The project acts as a namespace for feature views, which is a second-tier namespace for features, in the same way that a schema acts as a namespace for tables, which act as a second-tier for columns. In this mental model, I expect that when I use FeatureStore.get_online_features, it will do some kind of fast lookup based on a map from the combination of (project, feature view) (maybe based on a hash table?) to figure out where the features are located, and then retrieve the data. If that map is not available locally, then it can be retrieved from the remote registry and then cached for future repeated use. Once cached, I would expect the overhead of figuring out which feature view or data location to use would drop to near zero, but the process of retrieving the data itself would be unchanged (i.e. scale with the size of the data + normal internet latency).
Current Behavior
What I observe is that when querying for data using FeatureStore.get_online_features from a particular project I'm working with, it appears that searching for the appropriate feature view takes a significant amount of time when first run (as expected if the registry has not been cached yet; our registry is stored in a GCS bucket, for reference). I measure this by profiling my code (using line_profiler) and looking at time spent in the function FeatureStore._get_feature_views_to_use. Then there is some time for the data itself to be received, which I measure by looking at time spent in FeatureStore._read_from_online_store (in our case, online feature data is stored on a redis instance). On successive calls to get_online_features, the time spent querying the registry for the feature view metadata remains unchanged, counter to what I would expect (less time spent) if this data had been cached. Furthermore, successive calls result in significantly less time spent retrieving the actual feature data from redis (which I would expect to be unchanged).
In the process of diagnosing this, I tried querying feature data from a different project to see if it also had the same problems. The main difference is that this second project is significantly "smaller" than the first. The first project has maybe 20 feature views, which have anywhere between 2 and 32 features each. The second project only has two feature views with two features each. What I noticed was that when switching to project two, the time spent retrieving the registry dropped significantly (to like 10% of the time it takes to do the same for project one), although it still did not appear to be caching correctly as the time did not change on successive calls. Time retrieving the actual data was similar in both projects. See the table below for a summary of what we tried/observed:
| project | # features in view | ran before profiling | _get_feature_views_to_use |
_read_from_online_store |
|---|---|---|---|---|
| 1 | 32 | no | 101 ms | 312 ms |
| 1 | 32 | yes | 99 ms | 117 ms |
| 1 | 2 | no | 98 ms | 318 ms |
| 1 | 2 | yes | 96 ms | 101 ms |
| 2 | 2 | no | 6.4 ms | 303 ms |
| 2 | 2 | yes | 7.0 ms | 103 ms |
In addition, we found that within _get_feature_views_to_use, all the time was being taken up by the following loop:
for fv in self._registry.list_feature_views(
self.project, allow_cache=allow_cache
):
if hide_dummy_entity and fv.entities[0] == DUMMY_ENTITY_NAME:
fv.entities = []
fv.entity_columns = []
feature_views.append(fv)
In particular, it looks like the body of the loop had negligible runtime, but executing Registry.list_feature_views for project 1 returned an iterator of 18 items (equal to the number of feature views in that project) over 102 ms, which works out to about 5.6 ms/iteration, roughly on par with the total time to retrieve the registry for project 2. I initially suspected that perhaps this function is sequentially calling out to GCS to retrieve the registry data one feature view at a time rather than getting it all in one go? But I was able to further trace this all the way down to some calls to FeatureView.from_proto, which may suggest that deserializing some protobuf might be the bottleneck? My profiler wasn't able to go any deeper for some reason and I feel like I'm already in the weeds here anyway so I'll leave it at that.
Steps to reproduce
- Have a registry in a GCS bucket
- Have a redis online store
- Create a project with a bunch of feature views and features
- Create another project with very few feature views and features
- Query the data using
FeatureStore.get_online_featuresand profile the functionsFeatureStore._get_feature_views_to_useandFeatureStore._read_from_online_store
Specifications
- Version: python==3.9.16, feast==0.28.0
- Platform: mac, linux
- Subsystem: ?
Possible Solution
I think the best possible solution is to figure out why the registry isn't being cached and fix that. All of our TTL parameters are set to very long periods of time, so I'm not sure why caching does not occur though.
Given the above observations, a potential workaround might be to limit the number of feature views in each of our projects. This is less than ideal though since we have leaned heavily into using the project and feature view names as a hierarchical namespace (different groups/teams have separate projects, different services and ML models within each team have separate feature views).
Seems related to https://github.com/feast-dev/feast/issues/3090
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@ecotner I completely agree with you. I also experienced severe slowdown as the number of functions increased. First, we implemented a properly functioning cache with minimal code, and this is currently a temporary solution. Check out this PR (https://github.com/feast-dev/feast/pull/3702)
But this is not the right solution and is only a temporary solution. I think feast should stop using serialized objects as protocol buffers.