feast icon indicating copy to clipboard operation
feast copied to clipboard

feat: Refactor registry caching logic into a separate class

Open tokoko opened this issue 1 year ago • 2 comments

What this PR does / why we need it: Adds another abstract class CachingRegistry that extends BaseRegistry. It replaces BaseRegistry methods like get_entity with it's own abstract _get_entity and injects caching logic. Caching logic came from SqlRegistry, which was also refactored to use CachingRegistry as part of this PR.

P.S. I chose not to touch snowflake registry yet, because it isn't tested in CI.

Which issue(s) this PR fixes: Fixes #3940

tokoko avatar Feb 08 '24 17:02 tokoko

Hey why dont we add caching as a common feature of base registry class, so child classes can reuse it or override if needed I don't think we should implement as a standalone class

sudohainguyen avatar Feb 09 '24 02:02 sudohainguyen

@sudohainguyen Agreed, but I couldn't find a clean way to do it in base registry. Caching code is spread out in the constructor and all get_* and list_* methods. If optional caching is added directly in base, classes extending it would need some glue code in all those methods that calls out to cache, kinda similar to how it is today, but a bit less verbose probably. Do you have any ideas?

tokoko avatar Feb 09 '24 09:02 tokoko

So why do the SqlRegistry and SnowflakeRegistry has the logic of handling the caching, should it be in the Registry class?

HaoXuAI avatar Mar 05 '24 22:03 HaoXuAI

@HaoXuAI slight clarification, base class is called BaseRegistry. Registry is one of it's implementations for file-based registries. One problem in adding it directly in BaseRegistry is that we would have to update all implementations in one go. Maybe what we can do is start with a standalone class and after porting all implementations, fold this class into BaseRegistry. Another slight concern of mine is that, if we follow my approach of introducing new methods like _get_entity, _get_data_source, we would essentially be changing abstract interface in BaseRegistry and potentially screw up implementations that are not in main feast repo (any that we know of?). I don't know how okay we should be with that.

tokoko avatar Mar 06 '24 04:03 tokoko

@HaoXuAI slight clarification, base class is called BaseRegistry. Registry is one of it's implementations for file-based registries. One problem in adding it directly in BaseRegistry is that we would have to update all implementations in one go. Maybe what we can do is start with a standalone class and after porting all implementations, fold this class into BaseRegistry. Another slight concern of mine is that, if we follow my approach of introducing new methods like _get_entity, _get_data_source, we would essentially be changing abstract interface in BaseRegistry and potentially screw up implementations that are not in main feast repo (any that we know of?). I don't know how okay we should be with that.

right, BaseRegistry is abstract so better not to add implementations. is it possible to let Snowflake inherit from the SqlRegistry?

HaoXuAI avatar Mar 07 '24 15:03 HaoXuAI

@HaoXuAI that's an interesting idea. They don't really work the same way, sql registry creates multiple tables to save objects, snowflake uses just one. The thing is SqlRegistry depends only on sqlalchemy and in theory supports all databases with sqlalchemy dialect (including snowflake). The better question is probably if it will be a good idea to deprecate snowflake registry because of that. (The same applies to another contrib implementation PostgreSQLRegistryStore).

Having said that, I don't think these changes would make a separate caching mechanism irrelevant. We could also use it for RemoteRegistry for example. The PR you just merged for it doesn't have any caching yet.

tokoko avatar Mar 07 '24 21:03 tokoko

@HaoXuAI that's an interesting idea. They don't really work the same way, sql registry creates multiple tables to save objects, snowflake uses just one. The thing is SqlRegistry depends only on sqlalchemy and in theory supports all databases with sqlalchemy dialect (including snowflake). The better question is probably if it will be a good idea to deprecate snowflake registry because of that. (The same applies to another contrib implementation PostgreSQLRegistryStore).

Having said that, I don't think these changes would make a separate caching mechanism irrelevant. We could also use it for RemoteRegistry for example. The PR you just merged for it doesn't have any caching yet.

that make sense. I don't have a better solution yet, some kind of multi-inheritance maybe can fit into this situation? Anyway I think we can go with your implementation, and optimize it in the future.

HaoXuAI avatar Mar 09 '24 05:03 HaoXuAI

Sorry, merged another pr and caused a conflict. you might want to fix it :)

HaoXuAI avatar Mar 09 '24 05:03 HaoXuAI

@HaoXuAI no problem, thanks. it's ready

tokoko avatar Mar 09 '24 06:03 tokoko