feast
feast copied to clipboard
Support in memory SQLite online store
Is your feature request related to a problem? Please describe. When creating a SQLite online store your only option is to create it on the filesystem. As every access needs to hit the filesystem then this slows down the online store.
Describe the solution you'd like
I'd like an option :memory:
to use an in memory SQLite store instead. Eg in feature_store.yaml
:
online_store:
type: sqlite
path: :memory:
Currently if you do this feast
still tries to interpret this as a path
🙃
Describe alternatives you've considered
An alternative would be to change the key from path
however it's common to use :memory:
as the path.
Additional context Add any other context or screenshots about the feature request here.
@lukedyer-peak Hi Luke! Can you open a pr for this feature? I would be happy to review it!
I think we support this with the new SQL registry out of the box. Let me find a working example.
@lukedyer-peak https://github.com/feast-dev/feast/pull/3012 adds an example of how to do this.
Oh hah I just misread this issue entirely. This refers to the online store (and not the registry).
@kevjumba is right, this will require a PR (which we'd be happy to review).
@lukedyer-peak are you working on this? If you aren't I'd like to take a stab at this as a first contribution
@austinguo550 I haven't had time to start to work on it so happy for you to do it. I don't expect it to be a big PR at all! (Although I don't know the codebase)
sounds good, I'll take a gander then!
@lukedyer-peak After looking into this ticket I think I may be confused or am misunderstanding the use-case. When you want to use the in-memory sqlite database as an online store, you plan to use the feast SDK in an application of yours, correct? Unless I am missing something, if you were planning to use the feast CLI to set up a store, I'm not sure that would work because of sqlite limitations. Specifically, the limitations are that
- each connection to an in-memory sqlite database creates a distinct in-memory database by default
- the in-memory database is wiped out when all connections to that database are closed
This means that the only ways to keep an in-memory sqlite database alive are to
- keep a persistent connection open and pass that connection around within the same process when doing database ops. Because there is exclusive access to memory this should theoretically be the most performant option.
- enable the
cache=shared
flag for sqlite in-memory database and ensure that >=1 connection to the in-memory database exists at all times. Because this is shared access to memory this is theoretically less performant.
With these restrictions I can think of a few ways to move forward:
- use feast SDK in your application to manage your feature store, and keep a persistent connection open to keep the sqlite in-memory database alive. Your application will run in a single process. In order for this to work you will need to somehow know how long you need to keep this database alive so you can set the sqlite connection timeout (when it times out your database will be wiped), so this is probably not the best option
- use feast SDK in your application to manage your feature store, and enable
cache=shared
on the sqlite in-memory database, keep >=1 connection open at all times to keep the sqlite in-memory database alive. Feast will attempt to ensure >=1 connection to the in-memory database exists at all times so your database can stay alive indefinitely. Your application can run in a single process, or Feast will have to add support for creating a sqlite in-memory database in shared memory for multiple processes/child processes to use. - set up a process which keeps a sqlite in-memory database alive at all times and services requests from other processes to do ops on this in-memory database. In this process we enable
cache=shared
on the sqlite in-memory database, keep >=1 connection open at all times to keep the sqlite in-memory database alive. Feast will send all sqlite ops to this process to execute. This would likely take the most work to implement and may not be appropriate for Feast to maintain since Feast would be responsible for provisioning a daemon on your host and managing its lifecycle - that would be a hassle since users can destroy Feast feature repos by removing the directory without interacting with the Feast CLI at all, so that daemon wouldn't know when sqlite databases are no longer being used. - [doesn't need feature request] ditch the sqlite in-memory sqlite feature entirely but approximate in-memory performance by using in-memory disk storage paradigms like tmpfs, ramfs, ramdisk etc. Then you won't have to worry about cross-process access, keeping >=1 connection alive at all times, and have the option to move to a persistent sqlite database when you need it
Any ideas? Or am I missing something that would make this feature much simpler to implement?
@achals @kevjumba as maintainers of this repo I'd love to hear what you think as well
Yeah I think what is actually ideally would be to have an in process cache (the in memory SQLite online store) for the feature server (when deploying feast serve
). Then you can init the online store to live longer than a single request. This then serves as a cache for values in a more robust longer living database.
i.e. the online store is initialized as part of feature_server.py
We can do that, but wouldn't we still lose the tables created in the database between execution of feast apply
and feast serve
commands? I can think of a few ways to move forward from the limitations we discussed but it would help to clarify the use-case. Specifically the user's workflow and expected behavior.
What should be the behavior for these cases? Scenario 1
Time step | Shell 1 |
---|---|
1 | feast init feature_repo; cd feature_repo; feast apply; CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S") && feast materialize-incremental $CURRENT_TIME; feast serve |
Should the sqlite db be empty when we run feast serve
because we only use in-memory sqlite db? Or should we persist materialized features to the db on disk across commands like feast apply
and materialize-incremental
?
Scenario 2
Time step | Shell 1 | Shell 2 |
---|---|---|
1 | feast init feature_repo; cd feature_repo; feast apply; feast serve |
write features to feature server, read features from feature server |
2 | read features from online store using feature service |
Should we fail to read features in time step 2 because the in-memory database isn't persisted after feature serving? Or should we persist features written to the feature store while the feature server is running?
@austinguo550 Hey Austin, I think what Danny is mentioning seems more like scenario 2 where the in memory database does not support materialization prior to running the feature server as sqlite doesn't support persistence in memory unless we manually persist it to file.
I think the implementation can just support "alive" mode in the feature server and you don't need to worry about persistence when the feature server is down as that doesn't make a lot of sense anyways.
As discussed in the feast dev weekly, we aren't really sure what the requirements are here or whether this is an antipattern, so closing. Feel free to reopen