httpx icon indicating copy to clipboard operation
httpx copied to clipboard

Slow performance when merging cookies

Open gyula-lakatos opened this issue 2 years ago • 3 comments
trafficstars

  • [X] Initially raised as discussion #2874

There is a quite significant performance degradation when cookies are being used with requests.

httpx._client.BaseClient._merge_cookies calls into the cookiejar here:

        if cookies or self.cookies:
            merged_cookies = Cookies(self.cookies)
            merged_cookies.update(cookies)
            return merged_cookies
        return cookies

When either cookies or self.cookies is not None, then httpx._models.Cookies.__bool__ will call into a method called deepvalues in http.cookiejar.CookieJar by indirectly calling http.cookiejar.CookieJar.__iter__.

httpx._models.Cookies.__bool__:

    def __bool__(self) -> bool:
        for _ in self.jar:
            return True
        return False

http.cookiejar.CookieJar.__iter__:

    def __iter__(self):
        return deepvalues(self._cookies)

Deepvalues is a significant performance hog because it does a lot of things recursively.

A suggested solution is to check cookies and self.cookies for None instead of using __bool__.

I dropped a performance snapshot to illustrate the problem: image

Here deepvalues takes up almost 17% of the CPU time.

gyula-lakatos avatar Oct 02 '23 17:10 gyula-lakatos

Pulling in the stdlib http.cookiejar and urllib.request modules brings in quite a lot of submerged complexity... I'd wonder if we really need that for the httpx.Cookies implementation.

  • Could we instead be using a plain dictionary format under-the-hood for the cookie persistence?
  • Which cookie attributes does httpx honor? For example, does "expires" actually take affect?
  • Can we point to a really clear easy-to-read functional description of http cookies?

lovelydinosaur avatar Nov 03 '23 15:11 lovelydinosaur

Could we instead be using a plain dictionary format under-the-hood for the cookie persistence?

I think the biggest obstacle here is the fact that the jar attribute of Cookies class is a part of a public interface: https://github.com/encode/httpx/blob/fbe35add82032d119d60c7f4de9bce2ccb12f6a1/docs/api.md?plain=1#L154

Users may want to access it in order to customize the cookie policy (which is set during initialization and accessed when adding a header). So a compatibility layer may be warranted, and it can get complex.

Other than enforcing a cookie policy, the CookieJar appears to be mostly concerned about thread-safety, and it could be replicated in the Cookies class.

Which cookie attributes does httpx honor? For example, does "expires" actually take affect?

It's domain, path and expires. The domain and path are tested, and expires is enforced at the CookieJar level. Each time httpx.Cookies.set_cookie_header is called, it calls http.cookiejar.CookieJar. add_cookie_header, which clears the expired cookies: https://github.com/python/cpython/blob/24b5cbd3dce3fe37cdc787ccedd1e73a4f8cfc3c/Lib/http/cookiejar.py#L1387.

vergeev avatar Nov 05 '23 15:11 vergeev