interactions.py
interactions.py copied to clipboard
[REQUEST] Limiting cache
What is the feature you're proposing?
As of right now, caching has no limit.
I feel like there should be a way to set a limit for each type of cache, preferably in the Client
instance.
Something like:
bot = interactions.Client(
...,
cache_limits=interactions.CacheLimits(
messages=100,
guilds=50,
...
)
)
This is the best thing that I can think of off the top of my head. Another way is to ~~do what dis-snek did~~ do something with kwargs and have something like this:
bot = interactions.Client(
...,
message_cache=100,
guild_cache=50,
...
)
Additional Information
- [ ] My feature request is related to an existing Issue.
- Issue (if referenceable):
Garbage collection is a pretty bashed topic in terms of data in a library being handled. For the most part, the cache is only on-demand so it has to be done through a listened event, or by creating something through the Web API. I'm not sure if we particularly need limits because while yes, they do help with memory size, it's also hard locking how big a cache can be.
I think TTLing unused data in the cache would be a better approach, not limiting how big the cache can be. However, TTLing information would need to have some work done through how the data is accessed. If we want to reliably check this, we'd have to either force users checking the cache to use the current view()
method, or using the attrs
module which I have been looking into.
Ok I see
What I was thinking was to remove the first item(s) in the OrderedDict
when the cache hit the limit.
I don't know what TTLing is, but I like the current method of using get()
, but view()
could work I guess.
Ok I see What I was thinking was to remove the first item(s) in the
OrderedDict
when the cache hit the limit. I don't know what TTLing is, but I like the current method of usingget()
, butview()
could work I guess.
TTL (Time to live), something like store a timestamp when it was stored / accessed and either have a job/task that checks these dates in intervals to invalidate / remove from cache. Or check each entry everytime the cache is accessed. Hope that helps :)
Ohh I see, thanks!
Proposal may be amended to #909 , please make a comment referencing there.
I'd say while #909 mostly fixes it, it may still be useful having something like this for messages, as they can grow more or less unbounded in most servers since many messages are never deleted.
Ideally for me, you would be able to configure the cache size (with a reasonable default) and use the last accessed time as above or a combination of number of times accessed and last accessed time to determine what to purge when the cache limit is hit.
So when adding an item to the cache,
- If cache is full, then find least used item and remove
- Add new item to cache
This way you don't need to have another task responsible for pruning the cache and you don't unnecessarily remove items from the cache if there is no reason to.
Thanks
Due to no cache ttl my apps memory grows over the usage of a few days.
Before switching to interactions.py my memory was stable. As soon as I switched (you can tell from the dotted line) the memory increased everyday and never was collected.
Are you able to share what the most frequently cached objects are? I'm guessing it's between guilds and interactions.
I don't explicitly cache anything myself and I'm not sure how to see the full caching of interactions / guilds. My bots are in 1000+ servers so I'm assuming it's the interactions.
If you know how I can view it let me know and I'll share it.
You can get access to the cache by doing client._http.cache
. You can do cache[<type>]
to get to the Storage of that type, or cache.storages
to get a dict of type->Storage.
Once you have the storage, you can do len(storage.values)
to see how many items are cached of that type.
Code example:
cache = client._http.cache
for type, storage in cache.storages.items():
print(f"{type}: {len(storage.values)}")
It looks like <class 'interactions.api.models.message.Message'>
is constantly growing and I'm assuming that's causing the memory issues.
starting:
CACHE CHECK
-------------------
<class 'interactions.api.models.guild.Guild'>: 1086
<class 'interactions.api.models.channel.Channel'>: 41631
<class 'interactions.api.models.role.Role'>: 28583
<class 'interactions.api.models.member.Member'>: 1153
<class 'interactions.api.models.channel.Thread'>: 160
<class 'interactions.api.models.message.Message'>: 596
<class 'interactions.api.models.gw.MessageReaction'>: 0
30 mins later:
<class 'interactions.api.models.guild.Guild'>: 1087
<class 'interactions.api.models.channel.Channel'>: 41657
<class 'interactions.api.models.role.Role'>: 28596
<class 'interactions.api.models.member.Member'>: 1158
<class 'interactions.api.models.channel.Thread'>: 161
<class 'interactions.api.models.message.Message'>: 3361
<class 'interactions.api.models.gw.MessageReaction'>: 0
<class 'interactions.api.models.gw.ChannelPins'>: 0
<class 'interactions.api.models.gw.GuildJoinRequest'>: 0
<class 'interactions.api.models.gw.GuildStickers'>: 0
<class 'interactions.api.models.gw.GuildMember'>: 1
Between messages and channels, those seem to be cached the most aggressively. I suggest the developers look into implementing a TTLMixin
for handling keepalive of cached items
I think cache limits should be imposed and harshly enforced with a TTL focused on per minute. This means X amount towards the length of a max_size
can exist for only ttl
minutes.
field | max_size | ttl p. min |
---|---|---|
Guild |
1000 |
60 |
Channel * |
5000 |
300 |
Role |
7000 |
240 |
Member |
500 |
120 |
Thread |
250 |
60 |
Message * |
2250 |
180 |
* max_size
is determined by a rate of activity with 3 object creations per minute and the produced quotient from the given length of cached objects from @Dillonzer
@Dillonzer How many times are you working with roles? Are you doing anything with them, or is this purely from what's being cached? I'm astounded to see nearly 30,000 objects cached for them. That's insane!
@i0bs I don't do anything with roles. For context, all this bot does is takes input from the user and spits out embeds, w/ buttons to cycle through the different embeds created via the bot, from my API based on the users query. (ex, they put in a card name and it spits out information about those cards).
Interesting. The cache is not intelligent enough to determine what is used the least. After testing, the cache can contain "duped" objects of the same ID (using this as our basis) while values have changed.
Making that work would additionally prove to be a challenge as every object must perform a cache introspection to determine if it exists, and which fields to update. Rewriting the entire object will be extremely memory inefficient, so that would leave us to changing specific attributes.. which now goes into slotting and DictSerializerMixin
hassles. TTL seems the best way to go. I don't think LRU or FIFO/FILO-based caching would be efficient here.
For clarification, I'm also trying to use interactions.py for a company contract and I'm running into a plethora of caching problems. Most of them relate to "duped" objects, but the biggest problem by far has been the inability to control storage sizes per cacheable object. There's no way to performantly scale memory usage, I can't opt-out of things being cached without removing Gateway intents either.
After testing, the cache can contain "duped" objects of the same ID (using this as our basis) while values have changed.
@i0bs can you elaborate on this? I don't see how this is possible, since multiple objects with the same type and id would just end with one either updating or overwriting the other
Closing this as the base issue has been resolved.