feast icon indicating copy to clipboard operation
feast copied to clipboard

Awkward Deletion logic for Redis

Open felixwang9817 opened this issue 3 years ago • 10 comments

The Redis data model is not ideal. Specifically, feature data is stored in Redis with a key that only references the join key values, not the FV names. This means that two FVs sharing the same entities might share the same prefix in Redis, which results in awkward deletion logic. See #2240 for more details.

felixwang9817 avatar Jan 28 '22 07:01 felixwang9817

As mentioned in #2240 there are significant advantages to the current Redis data model. For example with the current model the minimum number of keys needed for any request is E where E is the number of unique entities in the request (this assumes that the OnlineStore interface is changed to be able to query more than a single FV at a time). With the change to including the FV name in the key prefix this would increase to sum(E_n * F_n) where E_n is the number of unique entities for the nth Feature View required and F_n is the nth FeatureView required.

This would hurt the theoretical optimal latency of online serving. Given that offline use cases (materialization, dataset creation, registry management etc.) are much less latency sensitive it feels wrong to me to prioritize code simplicity over online performance.

I'd argue that the Redis data model is better than other OnlineStores but that the interface between the FeatureStore and OnlineStore do not allow for adequate information for the Redis OnlineStore to 'do the right thing'. This applies for both populating/pruning the OnlineStore and for serving keys optimally.

judahrand avatar Jan 28 '22 10:01 judahrand

@judahrand those are good points. As you noted, one of the barriers to that theoretical optimal latency is the ability to query for multiple FVs at a time. I created #2281 to track that issue and suggest a potential solution!

In terms of the Redis data model, an alternative would be:

fv_name -> {
  entity_row_1 -> val_1,
  entity_row_2 -> val_2,
}

This is basically the inverse of what we have now. It seems to me that (provided we can query for multiple FVs) at a time that each data model has its own benefits. If you plan on retrieving many entity rows per feature, then this alternative makes sense, but if you just want all the features for a single entity row, then the current model makes sense. Either way I don't think there's any immediate changes to make, just thoughts to keep in mind for later.

cc @pyalex @DvirDukhan in case you guys want to chime in

felixwang9817 avatar Feb 04 '22 00:02 felixwang9817

If you plan on retrieving many entity rows per feature, then this alternative makes sense, but if you just want all the features for a single entity row, then the current model makes sense.

Hmmm... I wonder if this actually works out 🤔 I know @pyalex you found that about 50 fields per key HGETALL was much much faster than HMGET. So, I wonder how these two data models would compare in reality? There might be some Redis specific reason to pick one or the other?

judahrand avatar Feb 04 '22 09:02 judahrand

Something that comes to mind here is also materialization speed:

IIUC, Redis likely will be much more efficient in materialization if we use feature views instead. You can set multiple entity keys within the same feature view hash instead of forcing pipelines to execute on individual hsets.

cc @DvirDukhan to comment more though on some of this

adchia avatar Feb 07 '22 16:02 adchia

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 12 '22 11:06 stale[bot]

Another thing that's unideal about the redis data model is that we don't clean up keys for entities when we remove a feature view. This makes the data store size grow unboundedly.

achals avatar Jun 23 '22 19:06 achals

I believe the current data model also doesn't allow the natural use of expire given a Feature View TTL and #1802 will probably require custom expire strategies or custom logic after retrieval.

roelschr avatar Jun 28 '22 07:06 roelschr

@judahrand I just learned (the hard way) that the redis data model is not really being optimally used for retrieval and I saw you mentioned this in one of your comments above:

this assumes that the OnlineStore interface is changed to be able to query more than a single FV at a time.

Are there any plans to change the OnlineStore interface in the future? Our use case has a lot of FVs with few features each and I understand we could be using Redis better here.

roelschr avatar Aug 04 '22 15:08 roelschr

I think it'd be fine to change this to query multiple FV at the same time since it'd be fairly trivial to make sure that all existing online store implementations of e.g. a

Could imagine adding a online_read_multi method that takes in a list of FeatureViews or a FeatureService and returns a List[List[...]]

The default behavior for an Online store can be to serially call the original online_read method to make it backwards compatible.

adchia avatar Aug 04 '22 18:08 adchia

@roelschr would you mind starting a PR for this?

adchia avatar Aug 04 '22 18:08 adchia

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Dec 16 '22 03:12 stale[bot]