serenity Cache Redesign

Cache Redesign

Open xacrimon opened this issue 5 years ago • 14 comments

This is a tracking issue for feature requests and ideas for the cache system redesign i'll be taking a look at and working on when 0.6.0. First mission is probably using DHashMap from the ccl crate instead of regular RwLocked hashmaps. Design proposals should also go here.

May 26 '19 23:05 xacrimon

The implementation for the suggested alternative to HashMap can be found at https://gitlab.nebulanet.cc/xacrimon/ccl/blob/master/src/dhashmap.rs

May 26 '19 23:05 xacrimon

How about we make the caching calls more explicit? That would mean passing cache impl as an argument to every call/constructor explicitly. I’m not am a fan of magic...

To proper understand the problem though, maybe we could outline what we want to cache (and what’re we already caching right now)?

May 27 '19 08:05 MOZGIII

That has been done with serenity 0.6.x or well you pass a context which has a couple of other things as well but it then implements both AsRef<Http> and AsRef<Cache>.

Also regrading what we want to cache, we want to cache pretty much everything that we can. We can cache messages but that is disabled by default.

May 27 '19 08:05 Erk-

I'm going to draw up a Cache trait when I get home. Hold tight till then.

May 27 '19 08:05 xacrimon

Alright. So we want to abstract the cache API away from the actual methods used to store the data. I think the best way to do this is define a CacheBackend trait that we implement for different backends. We then take the trait object and wrap it like this

pub struct Cache {
    backend: Box<dyn CacheBackend>,
}

Finally we implement wrapper methods that call the backend.

This creates a clean solution and makes it easy to swap the cache if you so desire. This means that we can first refactor the cache system with a tested map like hashbrowns and then look into a better backend like DHashMap.

Does this sound alright?

May 27 '19 08:05 xacrimon

The new Cache struct would sit in the same place as the current cache currently does.

May 27 '19 08:05 xacrimon

I think it's a good idea to have an abstract cache layer, so I'm for it. That said, I need to get up to date with the current cache API, cause it seems like a lot has changed.

Regarding what's cached, I have a very rough idea on whether it's possible to do an efficient cache invalidation in a distributed scenario. So, if we want to cache everything - that means the data sources for cache are both the HTTP responses and date that's retrieved through the websocket connections, right?

May 27 '19 09:05 MOZGIII

Yup @MOZGIII. Those are the data sources. I'm not home yet but I'm going to draw up a proposition for the cache trait sometime today. This abstract layer means we can plug lots of backends in. Everything from a set of compact DHashMaps to something like memcached for distributed caching.

May 27 '19 09:05 xacrimon

The current cache api is just a RwLocked struct containing a set of HashMaps which makes in very inflexible and somewhat slow and limited by concurrency.

May 27 '19 09:05 xacrimon

Alright. I have a rough design proposal on how to glue the abstract layer together with everything else. It looks somewhat like this. This is not a complete example but the reader should get the gist of then design.

pub struct ChannelVirtualMap<'a>
where
    T: CacheBackend,
    K: Hash + Eq,
{
    cache: &'a Cache,
}

impl<'a> ChannelVirtualMap<'a>
where
    T: CacheBackend,
    K: Hash + Eq,
{
    pub fn insert(&'a self, id: ChannelId, val: Arc<RwLock<GuildChannel>>) {
        self.cache.channels_insert(id, val);
    }
    // add other methods
}

pub struct Cache {
    backend: Box<dyn CacheBackend>,
}

impl Deref for Cache {
    type Target = Box<dyn CacheBackend>;

    fn deref(&self) -> &Box<dyn CacheBackend> {
        &self.backend
    }
}

pub trait CacheBackend {
    fn channels_list(&self) -> impl Iterator<Item = (ChannelId, Arc<RwLock<GuildChannel>>)>;
    fn channels_get(&self, ChannelId) -> Option<&Arc<RwLock<GuildChannel>>>;
    fn channels_insert(&self, ChannelId, Arc<RwLock<GuildChannel>>);
    fn channels_get_mut(&self, ChannelId) -> Option<&mut Arc<RwLock<GuildChannel>>>;
    fn channels_map(&self) -> ChannelVirtualMap
    // add more methods as necessary

    // repeat for other things we want cache. adjust the virtualmap type accordingly (hashsets etc) or not have not at all depending on the item. Add other virtual types for a smooth interaction between the cache and the other code. The virtual types should make adjusting the rest of the library significantly easier.
}

May 27 '19 19:05 xacrimon

What's the point of having a ChannelVirtualMap?

May 27 '19 21:05 MOZGIII

Since the current API uses a hashmap. Having a virtulmap in its place would probably make switching easier.

May 28 '19 04:05 xacrimon

Also the methods should probably return a Deref<[mytype]> or an AsRef<[mytype}> instead.

May 28 '19 04:05 xacrimon

Me and Erk are going to revisit this once 0.6 ships and he has free time.

May 28 '19 05:05 xacrimon

I don't think we want such a redesign. Here's what I can think of that users might want from custom caches:

custom caching policies - can be implemented using callbacks or otherwise runtime configuring the built-in Cache
only caching resources they need to - implemented in #2210 and can be expanded upon
only caching model type fields they need to (cc @GnomedDev) - won't work with the Cache trait approach anyways, because the trait methods return whole model type instances, so you can't selectively cache only certain fields
more efficient data structures - if someone can optimize the data structures, they should contribute to upstream serenity

Nov 05 '22 20:11 kangalio

Anyone else want to chime in? This issue has had no activity for 4 years and I feel like none of the serenity devs are interested in redesigning the cache (correct me if I'm wrong?), so maybe we can close this issue

Apr 12 '23 01:04 kangalio

I'm not longer stripping models in a fork because it's a complete pain to update, but unless there was a feature flag per field I cannot imagine caching only the model fields that are needed would be possible.

Apr 12 '23 13:04 GnomedDev

serenity serenity copied to clipboard

Cache Redesign

serenity
serenity copied to clipboard