opendal icon indicating copy to clipboard operation
opendal copied to clipboard

feat: Initial foyer layer implementation

Open jorgehermo9 opened this issue 9 months ago • 18 comments

Closes https://github.com/apache/opendal/issues/5678

I open this as a draft to get some feedback about what I'm doing right now. There are missing tests and documentation, but I would like to get some feedback before working more on this

My main concern is, what would be the scope of this cache layer? Should we only cache read calls? Should we also cache list or stat calls? If so, I could implement that also in this PR, but I want to know if it makes sense first. Right now, only async read calls are cached

I will fix CI, add tests and documentation once this initial feedback!

Thanks a lot~

jorgehermo9 avatar Mar 28 '25 16:03 jorgehermo9

Hi, I've got a question: since it's a cache, when do we clear out expired data ? I couldn't spot the code handling that.

meteorgan avatar Mar 30 '25 13:03 meteorgan

@meteorgan I think there is no concept of TTL in foyer. You specify a max size in bytes of the cache, and it will be filled until the cache is full. If a new key is going yo be inserted with the cache being full, another entry is evicted (see https://foyer.rs/docs/tutorial/in-memory-cache#22-count-entries-by-weight)

jorgehermo9 avatar Mar 30 '25 14:03 jorgehermo9

@meteorgan I think there is no concept of TTL in foyer. You specify a max size in bytes of the cache, and it will be filled until the cache is full. If a new key is going yo be inserted with the cache being full, another entry is evicted (see https://foyer.rs/docs/tutorial/in-memory-cache#22-count-entries-by-weight)

What I mean is, if a key gets deleted or updated, we should refresh the cache. Otherwise, we'll end up with stale data.

meteorgan avatar Mar 31 '25 03:03 meteorgan

What I mean is, if a key gets deleted or updated, we should refresh the cache. Otherwise, we'll end up with stale data.

That's true. However, foyer has a lot of room for optimization when handling deletion and updatable data for now. Currently, in almost all user scenarios in Foyer, deleted data is not read, and existing data is not updated. Although Foyer has already made relevant designs and optimizations (e.g., tombstone log to ensure that delete is also append-only), it is not commonly used, so it still needs to be validated and optimized.

Therefore, for scenarios that require access to deleted data or updatable needs, I think Foyer should not be considered for use in the production environment for the time being.

However, based on my current experience, such scenarios are not very common.

Moreover, other similar cache systems also have some corner cases or performance issues in terms of consistency. We can discuss together how to improve it.

MrCroxx avatar Apr 03 '25 06:04 MrCroxx

Btw, foyer will not require serde as a must after v0.16.x (not released yet, but ready to release anytime.

I plan to do more refactoring in v0.16 to avoid frequently breaking existing user dependencies. However, if it is a blocker for this PR, please feel free to contact me for the release. 🙏

MrCroxx avatar Apr 03 '25 06:04 MrCroxx

Hi,

I had been busy these days and couldn't work on this further. I will resume my work as soon as I can, thank everyone for their review & comments!

@meteorgan

What I mean is, if a key gets deleted or updated, we should refresh the cache. Otherwise, we'll end up with stale data.

Yes, we could do that in the Operator that does the write operation, but files in storages can be modified by external sources (or other process in the same machine) and I don't know if it makes sense to just evict the cache key on the writer, other readers will have the same problem and have not easy way to detect those changes. I don't know how to proceed with evictions here. We could take a look to other implementations such as RisingWave's and see what they do regarding to cache invalidation

jorgehermo9 avatar Apr 06 '25 21:04 jorgehermo9

I don't know how to proceed with evictions here.

It's fine to just ignore evictions if foyer doesn't design for that so far. We just need to make it clear of FoyerLayer's API docs to make sure users know about this.

Xuanwo avatar Apr 07 '25 02:04 Xuanwo

Changed PR title to foyer layer instead.

Xuanwo avatar Apr 07 '25 02:04 Xuanwo

Hi, folks. Thank you for your contribution and effort in working on this PR.

As the author of foyer, I also hope to see Foyer better integrated with OpenDAL. If you are willing, I can take over this PR and continue with the adaptation.

MrCroxx avatar Apr 07 '25 02:04 MrCroxx

If you are willing, I can take over this PR and continue with the adaptation.

I would like to continue with this if possible, I have more time this ongoing week to push this further, if everyone is okay with it...

jorgehermo9 avatar Apr 07 '25 07:04 jorgehermo9

If you are willing, I can take over this PR and continue with the adaptation.

I would like to continue with this if possible, I have more time this ongoing week to push this further, if everyone is okay with it...

No problem. And feel free to ask for any help from my side. 🥰

MrCroxx avatar Apr 07 '25 07:04 MrCroxx

Hi, @MrCroxx, I believe it's better to cut off a release now so that @jorgehermo9 can work on the base without serde directly.

Xuanwo avatar Apr 07 '25 09:04 Xuanwo

Hi, @MrCroxx, I believe it's better to cut off a release now so that @jorgehermo9 can work on the base without serde directly.

Sure. I'll release a new version in hours.

MrCroxx avatar Apr 07 '25 09:04 MrCroxx

Yes, we could do that in the Operator that does the write operation, but files in storages can be modified by external sources (or other process in the same machine) and I don't know if it makes sense to just evict the cache key on the writer, other readers will have the same problem and have not easy way to detect those changes. I don't know how to proceed with evictions here. We could take a look to other implementations such as RisingWave's and see what they do regarding to cache invalidation

I completely agree with you, these are indeed tricky scenarios. I think we could invalidate the cache during write and delete operations, ensuring that as least the same process can always read the most recent data it has written. For the data modified by other processes, the simplest solution might be to evict data based on resident time, which users could configure. However, it would be wise to investigate further and reference to other implementations.

meteorgan avatar Apr 07 '25 09:04 meteorgan

Hi, @MrCroxx, I believe it's better to cut off a release now so that @jorgehermo9 can work on the base without serde directly.

Sure. I'll release a new version in hours.

Done. foyer v0.16.0 is released.

MrCroxx avatar Apr 07 '25 10:04 MrCroxx

Thanks for the review @Xuanwo, I was still working on the previous reviews and didn't have all things done, thats why I didnt mentioned you yet. I will work a lot more on this today though :)

Didn't forgot/ignore the previous comments, I have all of them in mind, just had been working slowly on them..

jorgehermo9 avatar Apr 15 '25 12:04 jorgehermo9

Hi @Xuanwo, I addressed all the comments I could, but left some doubts on opened threads, could you revisit that?

jorgehermo9 avatar Apr 15 '25 15:04 jorgehermo9

Hi @Xuanwo, I think I addressed everything (left some comments in a few threads). I plan to address documentation & tests once we are confident about how the final implementation looks like

Thank you very much for your reviews, I found them very learnable.

jorgehermo9 avatar Apr 16 '25 12:04 jorgehermo9