Remove _next_expiration field from CookieJar
Is your feature request related to a problem?
The _next_expiration field seems unused in CookieJar. It is only written to and never read in any part of aiohttp:
https://github.com/aio-libs/aiohttp/blob/a57dc3146839e2e1978b96db48e1de3af1a2bb50/aiohttp/cookiejar.py#L84
Describe the solution you'd like
Remove the field if it's useless
Describe alternatives you've considered
Maybe that it was intended for something that was never completely coded (e.g.,fast removal of expired cookies).
Related component
Client
Additional context
No response
Code of Conduct
- [X] I agree to follow the aio-libs Code of Conduct
Indeed this _next_expiration field was used to perform fast removal of expired cookies, but a posterior commits deleted the line of code that was using it: https://github.com/aio-libs/aiohttp/blob/d65f5cb4751651c7c43ac3744c92cbd4b23eec04/aiohttp/cookiejar.py#L87
Since that line of code has been deleted, _next_expiration is useless now (and the cleaning of cookies is slower as it doesn't use it to cut short the clear process).
The commit that deleted this line of code is https://github.com/aio-libs/aiohttp/pull/5249 by @derlih
As an alternative to removing _next_expiration entirely, this line of code could be reintroduced. However this is more difficult now, as this commit made the _do_expiration code to be removed, and call instead the clear method, which now serves additional needs, like allowing custom selection of cookies to be cleared through a callback function. To revert this, the old _do_expiration code shall be reinstated, instead of delegating the work to the clear method.
More largely, frequent execution paths (_do_expiration is called very often across the entire API) shall have simple code benefiting from shortcuts (like the _next_expiration field to determine if the expiration process can be skipped) and rare paths (like a clear operation with a custom selection function) could be isolated into dedicated code.
#7784 is also looking at removing duplicate calls to _do_expiration(). @bdraco Any opinion on whether this is worth reimplementing, or if it becomes a bit redundant after your changes?
Ideally the _next_expiration functionality would be restored.
In Home Assistant we see a lot of time spent in filter_cookies because sessions are shared between integrations which is aggravated by _do_expiration being called multiple times currently. Home Assistant tends to have a lot of cookies built up in the cookie jar which would make the restoration of the _next_expiration functionality quite helpful for performance since most calls to _do_expiration would return almost immediately.
Same here - I use aiohttp for a large-scale crawler. It works wonder, excepted the time it spends processing the cookie jar. The code seems complex and, potentially, not optimized for the frequent case (expiring cookies before selecting the set of active cookies to send to a website) but for rare cases (clearing cookies based on a custom logic).
py-spy of a request. Purple area is the time spent filtering cookies
Zoomed in
How many cookies were in the jar? I have several thousands (across a couple hundred domains)
I don't have a good way to inspect that as its from a live running instance, but I can clearly say no where near that many. I'd ballpark it to be 50-100 at most. I expect for your use case you probably spend the bulk of the request time in the cookie jar.
Yep that's what I see. The cookie jar is not scalable, specifically due to the expiration logic.
I didn't see this through py-spy but when switching the crawler from a DummyCookieJar to the standard CookieJar
Sounds like reintroducing it is the correct approach then.
In fact the entire logic could be switched in favor of a time-ordered expiry queue. That way, there is never any need to loop over every cookie. Only the ones in front of the queue (the ones due to expire soon) need to be checked. The loop is exited as soon as the next iterated cookie from that queue is not expired yet. The heapq standard library provides the priority queue algorithm for free.
Well, feel free to come up with a proposal, if it's clearly better without too much complexity, then we can merge it. Just bear in mind a 3rd PR currently in progress, that affects cookie performance is #7777.
Sounds great, I will wait for #7777 to be merged and then, depending on if it resolved this issue or not and the new capabilities it offers, make a proposal.
It doesn't change the expirations, but will reduce the number of cookies being looked at in filter_cookies(). Mainly just want to make sure you don't end up with a load of merge conflicts.
I understand, I will find the best path.
In case its help to test against. Below is a basic benchmark (can likely be improved)
import timeit
from http import cookies
from yarl import URL
from aiohttp import CookieJar
def filter_large_cookie_jar():
"""Filter out large cookies from the cookie jar."""
jar = CookieJar()
c = cookies.SimpleCookie()
domain_url = URL("http://maxagetest.com/")
other_url = URL("http://otherurl.com/")
for i in range(5000):
cookie_name = f"max-age-cookie{i}"
c[cookie_name] = "any"
c[cookie_name]["max-age"] = 60
c[cookie_name]["domain"] = "maxagetest.com"
jar.update_cookies(c, domain_url)
assert len(jar) == 5000
assert len(jar.filter_cookies(domain_url)) == 5000
assert len(jar.filter_cookies(other_url)) == 0
filter_domain = timeit.timeit(lambda: jar.filter_cookies(domain_url), number=1000)
filter_other_domain = timeit.timeit(
lambda: jar.filter_cookies(other_url), number=1000
)
print(f"filter_domain: {filter_domain}")
print(f"filter_other_domain: {filter_other_domain}")
filter_large_cookie_jar()
This is a great benchmark - will use it and publish sample measures for future reference
Sounds great, I will wait for #7777 to be merged and then, depending on if it resolved this issue or not and the new capabilities it offers, make a proposal.
PR is merged now, so feel free to propose further changes.