feat: Initial foyer layer implementation
Closes https://github.com/apache/opendal/issues/5678
I open this as a draft to get some feedback about what I'm doing right now. There are missing tests and documentation, but I would like to get some feedback before working more on this
My main concern is, what would be the scope of this cache layer? Should we only cache read calls? Should we also cache list or stat calls? If so, I could implement that also in this PR, but I want to know if it makes sense first. Right now, only async read calls are cached
I will fix CI, add tests and documentation once this initial feedback!
Thanks a lot~
Hi, I've got a question: since it's a cache, when do we clear out expired data ? I couldn't spot the code handling that.
@meteorgan I think there is no concept of TTL in foyer. You specify a max size in bytes of the cache, and it will be filled until the cache is full. If a new key is going yo be inserted with the cache being full, another entry is evicted (see https://foyer.rs/docs/tutorial/in-memory-cache#22-count-entries-by-weight)
@meteorgan I think there is no concept of TTL in foyer. You specify a max size in bytes of the cache, and it will be filled until the cache is full. If a new key is going yo be inserted with the cache being full, another entry is evicted (see https://foyer.rs/docs/tutorial/in-memory-cache#22-count-entries-by-weight)
What I mean is, if a key gets deleted or updated, we should refresh the cache. Otherwise, we'll end up with stale data.
What I mean is, if a key gets deleted or updated, we should refresh the cache. Otherwise, we'll end up with stale data.
That's true. However, foyer has a lot of room for optimization when handling deletion and updatable data for now. Currently, in almost all user scenarios in Foyer, deleted data is not read, and existing data is not updated. Although Foyer has already made relevant designs and optimizations (e.g., tombstone log to ensure that delete is also append-only), it is not commonly used, so it still needs to be validated and optimized.
Therefore, for scenarios that require access to deleted data or updatable needs, I think Foyer should not be considered for use in the production environment for the time being.
However, based on my current experience, such scenarios are not very common.
Moreover, other similar cache systems also have some corner cases or performance issues in terms of consistency. We can discuss together how to improve it.
Btw, foyer will not require serde as a must after v0.16.x (not released yet, but ready to release anytime.
I plan to do more refactoring in v0.16 to avoid frequently breaking existing user dependencies. However, if it is a blocker for this PR, please feel free to contact me for the release. 🙏
Hi,
I had been busy these days and couldn't work on this further. I will resume my work as soon as I can, thank everyone for their review & comments!
@meteorgan
What I mean is, if a key gets deleted or updated, we should refresh the cache. Otherwise, we'll end up with stale data.
Yes, we could do that in the Operator that does the write operation, but files in storages can be modified by external sources (or other process in the same machine) and I don't know if it makes sense to just evict the cache key on the writer, other readers will have the same problem and have not easy way to detect those changes. I don't know how to proceed with evictions here. We could take a look to other implementations such as RisingWave's and see what they do regarding to cache invalidation
I don't know how to proceed with evictions here.
It's fine to just ignore evictions if foyer doesn't design for that so far. We just need to make it clear of FoyerLayer's API docs to make sure users know about this.
Changed PR title to foyer layer instead.
Hi, folks. Thank you for your contribution and effort in working on this PR.
As the author of foyer, I also hope to see Foyer better integrated with OpenDAL. If you are willing, I can take over this PR and continue with the adaptation.
If you are willing, I can take over this PR and continue with the adaptation.
I would like to continue with this if possible, I have more time this ongoing week to push this further, if everyone is okay with it...
If you are willing, I can take over this PR and continue with the adaptation.
I would like to continue with this if possible, I have more time this ongoing week to push this further, if everyone is okay with it...
No problem. And feel free to ask for any help from my side. 🥰
Hi, @MrCroxx, I believe it's better to cut off a release now so that @jorgehermo9 can work on the base without serde directly.
Hi, @MrCroxx, I believe it's better to cut off a release now so that @jorgehermo9 can work on the base without serde directly.
Sure. I'll release a new version in hours.
Yes, we could do that in the
Operatorthat does thewriteoperation, but files in storages can be modified by external sources (or other process in the same machine) and I don't know if it makes sense to just evict the cache key on the writer, other readers will have the same problem and have not easy way to detect those changes. I don't know how to proceed with evictions here. We could take a look to other implementations such as RisingWave's and see what they do regarding to cache invalidation
I completely agree with you, these are indeed tricky scenarios. I think we could invalidate the cache during write and delete operations, ensuring that as least the same process can always read the most recent data it has written. For the data modified by other processes, the simplest solution might be to evict data based on resident time, which users could configure. However, it would be wise to investigate further and reference to other implementations.
Hi, @MrCroxx, I believe it's better to cut off a release now so that @jorgehermo9 can work on the base without serde directly.
Sure. I'll release a new version in hours.
Done. foyer v0.16.0 is released.
Thanks for the review @Xuanwo, I was still working on the previous reviews and didn't have all things done, thats why I didnt mentioned you yet. I will work a lot more on this today though :)
Didn't forgot/ignore the previous comments, I have all of them in mind, just had been working slowly on them..
Hi @Xuanwo, I addressed all the comments I could, but left some doubts on opened threads, could you revisit that?
Hi @Xuanwo, I think I addressed everything (left some comments in a few threads). I plan to address documentation & tests once we are confident about how the final implementation looks like
Thank you very much for your reviews, I found them very learnable.