interactions.py icon indicating copy to clipboard operation
interactions.py copied to clipboard

[REQUEST] Limiting cache

Open Toricane opened this issue 2 years ago • 7 comments

What is the feature you're proposing? As of right now, caching has no limit. I feel like there should be a way to set a limit for each type of cache, preferably in the Client instance.

Something like:

bot = interactions.Client(
    ...,
    cache_limits=interactions.CacheLimits(
        messages=100,
        guilds=50,
        ...
    )
)

This is the best thing that I can think of off the top of my head. Another way is to ~~do what dis-snek did~~ do something with kwargs and have something like this:

bot = interactions.Client(
    ...,
    message_cache=100,
    guild_cache=50,
    ...
)

Additional Information

  • [ ] My feature request is related to an existing Issue.
    • Issue (if referenceable):

Toricane avatar Feb 07 '22 03:02 Toricane

Garbage collection is a pretty bashed topic in terms of data in a library being handled. For the most part, the cache is only on-demand so it has to be done through a listened event, or by creating something through the Web API. I'm not sure if we particularly need limits because while yes, they do help with memory size, it's also hard locking how big a cache can be.

I think TTLing unused data in the cache would be a better approach, not limiting how big the cache can be. However, TTLing information would need to have some work done through how the data is accessed. If we want to reliably check this, we'd have to either force users checking the cache to use the current view() method, or using the attrs module which I have been looking into.

i0bs avatar Feb 07 '22 03:02 i0bs

Ok I see What I was thinking was to remove the first item(s) in the OrderedDict when the cache hit the limit. I don't know what TTLing is, but I like the current method of using get(), but view() could work I guess.

Toricane avatar Feb 07 '22 03:02 Toricane

Ok I see What I was thinking was to remove the first item(s) in the OrderedDict when the cache hit the limit. I don't know what TTLing is, but I like the current method of using get(), but view() could work I guess.

TTL (Time to live), something like store a timestamp when it was stored / accessed and either have a job/task that checks these dates in intervals to invalidate / remove from cache. Or check each entry everytime the cache is accessed. Hope that helps :)

tagptroll1 avatar Feb 08 '22 08:02 tagptroll1

Ohh I see, thanks!

Toricane avatar Feb 08 '22 15:02 Toricane

Proposal may be amended to #909 , please make a comment referencing there.

i0bs avatar Jul 11 '22 01:07 i0bs

I'd say while #909 mostly fixes it, it may still be useful having something like this for messages, as they can grow more or less unbounded in most servers since many messages are never deleted.

AstreaTSS avatar Jul 27 '22 04:07 AstreaTSS

Ideally for me, you would be able to configure the cache size (with a reasonable default) and use the last accessed time as above or a combination of number of times accessed and last accessed time to determine what to purge when the cache limit is hit.

So when adding an item to the cache,

  1. If cache is full, then find least used item and remove
  2. Add new item to cache

This way you don't need to have another task responsible for pruning the cache and you don't unnecessarily remove items from the cache if there is no reason to.

Thanks

holdur-ground avatar Aug 05 '22 03:08 holdur-ground

Due to no cache ttl my apps memory grows over the usage of a few days.

image

Before switching to interactions.py my memory was stable. As soon as I switched (you can tell from the dotted line) the memory increased everyday and never was collected.

Dillonzer avatar Oct 03 '22 15:10 Dillonzer

Are you able to share what the most frequently cached objects are? I'm guessing it's between guilds and interactions.

i0bs avatar Oct 03 '22 16:10 i0bs

I don't explicitly cache anything myself and I'm not sure how to see the full caching of interactions / guilds. My bots are in 1000+ servers so I'm assuming it's the interactions.

If you know how I can view it let me know and I'll share it.

Dillonzer avatar Oct 03 '22 21:10 Dillonzer

You can get access to the cache by doing client._http.cache. You can do cache[<type>] to get to the Storage of that type, or cache.storages to get a dict of type->Storage. Once you have the storage, you can do len(storage.values) to see how many items are cached of that type.

Code example:

cache = client._http.cache
for type, storage in cache.storages.items():
  print(f"{type}: {len(storage.values)}")

Catalyst4222 avatar Oct 03 '22 22:10 Catalyst4222

It looks like <class 'interactions.api.models.message.Message'> is constantly growing and I'm assuming that's causing the memory issues.

starting:

CACHE CHECK
-------------------
<class 'interactions.api.models.guild.Guild'>: 1086
<class 'interactions.api.models.channel.Channel'>: 41631
<class 'interactions.api.models.role.Role'>: 28583
<class 'interactions.api.models.member.Member'>: 1153
<class 'interactions.api.models.channel.Thread'>: 160
<class 'interactions.api.models.message.Message'>: 596
<class 'interactions.api.models.gw.MessageReaction'>: 0

30 mins later:

<class 'interactions.api.models.guild.Guild'>: 1087
<class 'interactions.api.models.channel.Channel'>: 41657
<class 'interactions.api.models.role.Role'>: 28596
<class 'interactions.api.models.member.Member'>: 1158
<class 'interactions.api.models.channel.Thread'>: 161
<class 'interactions.api.models.message.Message'>: 3361
<class 'interactions.api.models.gw.MessageReaction'>: 0
<class 'interactions.api.models.gw.ChannelPins'>: 0
<class 'interactions.api.models.gw.GuildJoinRequest'>: 0
<class 'interactions.api.models.gw.GuildStickers'>: 0
<class 'interactions.api.models.gw.GuildMember'>: 1

Dillonzer avatar Oct 04 '22 17:10 Dillonzer

Between messages and channels, those seem to be cached the most aggressively. I suggest the developers look into implementing a TTLMixin for handling keepalive of cached items

I think cache limits should be imposed and harshly enforced with a TTL focused on per minute. This means X amount towards the length of a max_size can exist for only ttl minutes.

field max_size ttl p. min
Guild 1000 60
Channel* 5000 300
Role 7000 240
Member 500 120
Thread 250 60
Message* 2250 180

* max_size is determined by a rate of activity with 3 object creations per minute and the produced quotient from the given length of cached objects from @Dillonzer

i0bs avatar Oct 04 '22 18:10 i0bs

@Dillonzer How many times are you working with roles? Are you doing anything with them, or is this purely from what's being cached? I'm astounded to see nearly 30,000 objects cached for them. That's insane!

i0bs avatar Oct 04 '22 18:10 i0bs

@i0bs I don't do anything with roles. For context, all this bot does is takes input from the user and spits out embeds, w/ buttons to cycle through the different embeds created via the bot, from my API based on the users query. (ex, they put in a card name and it spits out information about those cards).

Dillonzer avatar Oct 04 '22 19:10 Dillonzer

Interesting. The cache is not intelligent enough to determine what is used the least. After testing, the cache can contain "duped" objects of the same ID (using this as our basis) while values have changed.

Making that work would additionally prove to be a challenge as every object must perform a cache introspection to determine if it exists, and which fields to update. Rewriting the entire object will be extremely memory inefficient, so that would leave us to changing specific attributes.. which now goes into slotting and DictSerializerMixin hassles. TTL seems the best way to go. I don't think LRU or FIFO/FILO-based caching would be efficient here.

i0bs avatar Oct 05 '22 04:10 i0bs

For clarification, I'm also trying to use interactions.py for a company contract and I'm running into a plethora of caching problems. Most of them relate to "duped" objects, but the biggest problem by far has been the inability to control storage sizes per cacheable object. There's no way to performantly scale memory usage, I can't opt-out of things being cached without removing Gateway intents either.

i0bs avatar Oct 05 '22 04:10 i0bs

After testing, the cache can contain "duped" objects of the same ID (using this as our basis) while values have changed.

@i0bs can you elaborate on this? I don't see how this is possible, since multiple objects with the same type and id would just end with one either updating or overwriting the other

Catalyst4222 avatar Oct 21 '22 01:10 Catalyst4222

Closing this as the base issue has been resolved.

EdVraz avatar Nov 10 '22 17:11 EdVraz