feast icon indicating copy to clipboard operation
feast copied to clipboard

Support in memory SQLite online store

Open lukedyer-peak opened this issue 2 years ago • 13 comments

Is your feature request related to a problem? Please describe. When creating a SQLite online store your only option is to create it on the filesystem. As every access needs to hit the filesystem then this slows down the online store.

Describe the solution you'd like I'd like an option :memory: to use an in memory SQLite store instead. Eg in feature_store.yaml:

online_store:
    type: sqlite
    path: :memory:

Currently if you do this feast still tries to interpret this as a path 🙃

Describe alternatives you've considered An alternative would be to change the key from path however it's common to use :memory: as the path.

Additional context Add any other context or screenshots about the feature request here.

lukedyer-peak avatar Aug 03 '22 09:08 lukedyer-peak

@lukedyer-peak Hi Luke! Can you open a pr for this feature? I would be happy to review it!

kevjumba avatar Aug 03 '22 22:08 kevjumba

I think we support this with the new SQL registry out of the box. Let me find a working example.

achals avatar Aug 03 '22 23:08 achals

@lukedyer-peak https://github.com/feast-dev/feast/pull/3012 adds an example of how to do this.

achals avatar Aug 04 '22 19:08 achals

Oh hah I just misread this issue entirely. This refers to the online store (and not the registry).

@kevjumba is right, this will require a PR (which we'd be happy to review).

achals avatar Aug 04 '22 19:08 achals

@lukedyer-peak are you working on this? If you aren't I'd like to take a stab at this as a first contribution

austinguo550 avatar Aug 08 '22 17:08 austinguo550

@austinguo550 I haven't had time to start to work on it so happy for you to do it. I don't expect it to be a big PR at all! (Although I don't know the codebase)

lukedyer-peak avatar Aug 10 '22 10:08 lukedyer-peak

sounds good, I'll take a gander then!

austinguo550 avatar Aug 16 '22 16:08 austinguo550

@lukedyer-peak After looking into this ticket I think I may be confused or am misunderstanding the use-case. When you want to use the in-memory sqlite database as an online store, you plan to use the feast SDK in an application of yours, correct? Unless I am missing something, if you were planning to use the feast CLI to set up a store, I'm not sure that would work because of sqlite limitations. Specifically, the limitations are that

  1. each connection to an in-memory sqlite database creates a distinct in-memory database by default
  2. the in-memory database is wiped out when all connections to that database are closed

This means that the only ways to keep an in-memory sqlite database alive are to

  1. keep a persistent connection open and pass that connection around within the same process when doing database ops. Because there is exclusive access to memory this should theoretically be the most performant option.
  2. enable the cache=shared flag for sqlite in-memory database and ensure that >=1 connection to the in-memory database exists at all times. Because this is shared access to memory this is theoretically less performant.

With these restrictions I can think of a few ways to move forward:

  1. use feast SDK in your application to manage your feature store, and keep a persistent connection open to keep the sqlite in-memory database alive. Your application will run in a single process. In order for this to work you will need to somehow know how long you need to keep this database alive so you can set the sqlite connection timeout (when it times out your database will be wiped), so this is probably not the best option
  2. use feast SDK in your application to manage your feature store, and enable cache=shared on the sqlite in-memory database, keep >=1 connection open at all times to keep the sqlite in-memory database alive. Feast will attempt to ensure >=1 connection to the in-memory database exists at all times so your database can stay alive indefinitely. Your application can run in a single process, or Feast will have to add support for creating a sqlite in-memory database in shared memory for multiple processes/child processes to use.
  3. set up a process which keeps a sqlite in-memory database alive at all times and services requests from other processes to do ops on this in-memory database. In this process we enable cache=shared on the sqlite in-memory database, keep >=1 connection open at all times to keep the sqlite in-memory database alive. Feast will send all sqlite ops to this process to execute. This would likely take the most work to implement and may not be appropriate for Feast to maintain since Feast would be responsible for provisioning a daemon on your host and managing its lifecycle - that would be a hassle since users can destroy Feast feature repos by removing the directory without interacting with the Feast CLI at all, so that daemon wouldn't know when sqlite databases are no longer being used.
  4. [doesn't need feature request] ditch the sqlite in-memory sqlite feature entirely but approximate in-memory performance by using in-memory disk storage paradigms like tmpfs, ramfs, ramdisk etc. Then you won't have to worry about cross-process access, keeping >=1 connection alive at all times, and have the option to move to a persistent sqlite database when you need it

Any ideas? Or am I missing something that would make this feature much simpler to implement?

austinguo550 avatar Aug 24 '22 20:08 austinguo550

@achals @kevjumba as maintainers of this repo I'd love to hear what you think as well

austinguo550 avatar Aug 24 '22 20:08 austinguo550

Yeah I think what is actually ideally would be to have an in process cache (the in memory SQLite online store) for the feature server (when deploying feast serve). Then you can init the online store to live longer than a single request. This then serves as a cache for values in a more robust longer living database.

adchia avatar Aug 30 '22 22:08 adchia

i.e. the online store is initialized as part of feature_server.py

adchia avatar Aug 30 '22 22:08 adchia

We can do that, but wouldn't we still lose the tables created in the database between execution of feast apply and feast serve commands? I can think of a few ways to move forward from the limitations we discussed but it would help to clarify the use-case. Specifically the user's workflow and expected behavior.

What should be the behavior for these cases? Scenario 1

Time step Shell 1
1 feast init feature_repo; cd feature_repo; feast apply; CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S") && feast materialize-incremental $CURRENT_TIME; feast serve

Should the sqlite db be empty when we run feast serve because we only use in-memory sqlite db? Or should we persist materialized features to the db on disk across commands like feast apply and materialize-incremental?

Scenario 2

Time step Shell 1 Shell 2
1 feast init feature_repo; cd feature_repo; feast apply; feast serve write features to feature server, read features from feature server
2 read features from online store using feature service

Should we fail to read features in time step 2 because the in-memory database isn't persisted after feature serving? Or should we persist features written to the feature store while the feature server is running?

austinguo550 avatar Aug 31 '22 18:08 austinguo550

@austinguo550 Hey Austin, I think what Danny is mentioning seems more like scenario 2 where the in memory database does not support materialization prior to running the feature server as sqlite doesn't support persistence in memory unless we manually persist it to file.

I think the implementation can just support "alive" mode in the feature server and you don't need to worry about persistence when the feature server is down as that doesn't make a lot of sense anyways.

kevjumba avatar Sep 07 '22 20:09 kevjumba

As discussed in the feast dev weekly, we aren't really sure what the requirements are here or whether this is an antipattern, so closing. Feel free to reopen

adchia avatar Sep 27 '22 15:09 adchia