foyer feat: Allow listing of keys

It would be great if we could get a list of keys that are in the cache, so that I can invalidate entries in bulk without having to clear the entire cache

Apr 22 '25 02:04 cetra3

Hi, Peter. Thank you for reporting.

There was a previous feature request for key iteration with foyer cache, but I'm careful about this feature. Because foyer provides thread-safe APIs, iterating on foyer cache requires holding the mutex and may affect the performance of the concurrent requests. Besides, there is no order guarantee for iteration.

May I ask for more details of your request? By "invalidate entries in bulk" do you mean you need to filter the cache by keys or other properties?

Apr 22 '25 04:04 MrCroxx

There are two reasons for this:

I'm troubleshooting a memory leak/memory fragmentation issue. It would be great to know what the cache thinks it has in terms of total size of everything, so I can rule out what's happening. I have tried with metrics, but the foyer_memory_usage metric does not look like it gets changed when you call clear().
We're using foyer to cache byte ranges of files, essentially with a key of (path, start, end) and so when we delete a file we want to delete all the keys that match some predicate, rather than clear the entire cache.

Apr 22 '25 06:04 cetra3

I'm troubleshooting a memory leak/memory fragmentation issue. It would be great to know what the cache thinks it has in terms of total size of everything, so I can rule out what's happening. I have tried with metrics, but the foyer_memory_usage metric does not look like it gets changed when you call clear().

Let me help investigate why the result of clear() didn't reflect in the metrics.

Btw, using tikv-jemalloc with jeprof can be helpful to debug OOM issue, and ASAN can be used to debug memory leak. (foyer has ASAN test workflow on CI)

We're using foyer to cache byte ranges of files, essentially with a key of (path, start, end) and so when we delete a file we want to delete all the keys that match some predicate, rather than clear the entire cache.

Got it. RisingWave has the similar problem when caching LSM-tree SST blocks. Maybe I can provide a concurrent iterator that can help in this case. However, I still concern about the performance while using theses APIs with other concurrent requests. 🤔

Apr 22 '25 13:04 MrCroxx

I'll give another use case: I wanted to add a debug endpoint that gives me some info on what is in the cache. I'd mostly use it to debug during local development, I don't care about performance or consistency. But currently I can't get anything out of Foyer even if I am willing to accept these issues. I understand why you might not want to implement IntoIterator or something like that where one can unknowingly opt into undesired behavior but if it's called list_keys_dangerous or whatever that should not be a concern.

Sep 04 '25 20:09 adriangb

I'll give another use case: I wanted to add a debug endpoint that gives me some info on what is in the cache. I'd mostly use it to debug during local development, I don't care about performance or consistency. But currently I can't get anything out of Foyer even if I am willing to accept these issues. I understand why you might not want to implement IntoIterator or something like that where one can unknowingly opt into undesired behavior but if it's called list_keys_dangerous or whatever that should not be a concern.

Sounds reasonable. Let me think if there is a better solution to make the API more consistent.

Sep 05 '25 06:09 MrCroxx

Hey there! I was about to open a similar issue: being able to list keys is a requirement in our query-engine case 😄

Big +1

Sep 29 '25 10:09 theotzen