aiohttp icon indicating copy to clipboard operation
aiohttp copied to clipboard

Compliance with rfc6265 for cookie expires dates

Open bob-pasabi opened this issue 6 years ago • 5 comments

Long story short

Compliance with rfc6265 for cookie expires functionality. Took a while yesterday trying to work out why some client code wasn't working correctly, eventually tracked it down to cookie handling in aiohttp; well in pythons http.cookies. The server was returning an old but should be valid date format.

Expected behaviour

The specific problems I had are related to expires. Digging through the rfc's my current understanding is that expires as defined in rfc6265 states that the date format should be a sane-cookie-date as described in https://tools.ietf.org/html/rfc2616#section-3.3.1

  Sun, 06 Nov 1994 08:49:37 GMT  ; RFC 822, updated by RFC 1123
  Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036
  Sun Nov  6 08:49:37 1994       ; ANSI C's asctime() format

Problem is that aiohttp doesn't handle all these options and it appears to strip cookies that have two of these formats.

Steps to reproduce

import sys
import http.server
import socketserver

class MainHandler(http.server.BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.send_header('Set-Cookie', 'verify1=verify1; Path=/;')
        self.send_header('Set-Cookie', 'rfc822=rfc822value; expires=Fri, 06 Nov 2020 08:49:37 GMT; Max-Age=31449600; Path=/;')
        self.send_header('Set-Cookie', 'rfc850=rfc850value; expires=Friday, 06-Nov-20 08:49:37 GMT; Max-Age=31449600; Path=/;')
        self.send_header('Set-Cookie', 'ansic=ansicvalue; expires=Fri Nov  6 08:49:37 2020; Max-Age=31449600; Path=/;')
        self.send_header('Set-Cookie', 'common=commonvalue; expires=Fri, 06-Nov-2020 16:01:51 GMT; Max-Age=31449600; Path=/;')
        self.send_header('Set-Cookie', 'expired=expiredvalue; expires=Fri, 06 Nov 1999 14:18:35 GMT; Path=/;')
        self.end_headers()
        self.wfile.write(b'cookies!')

def main(port):
    with socketserver.TCPServer(("", port), MainHandler) as httpd:
        print("serving at port", port)
        httpd.serve_forever()

if __name__ == '__main__':
    main(int(sys.argv[1]))

This is a simple python server to return the different values, I included the common type with 4 digit years as everywhere seems to use it.

If I run a request with aiohttp

import aiohttp
import asyncio

async def main():
    async with aiohttp.ClientSession() as session:
        async with session.get('http://localhost:8000') as response:
            await response.text()

            for c in response.cookies:
                print(c)
            print('in session')
            for c in session.cookie_jar:
                print(c)

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

Actual behaviour

I will get output along the lines of

verify1
rfc822
common
expired
in session
Set-Cookie: verify1=verify1; Domain=localhost; Path=/
Set-Cookie: rfc822=rfc822value; Domain=localhost; expires=Fri, 06 Nov 2020 08:49:37 GMT; Max-Age=31449600; Path=/
Set-Cookie: common=commonvalue; Domain=localhost; expires=Fri, 06-Nov-2020 16:01:51 GMT; Max-Age=31449600; Path=/

rfc850 and ansic dates are excluded during response parsing.

I'm not sure if supporting these formats is important to you, I'm going to get the server we are connecting to to switch formats aliviating the problem for me specifically. Just thought I would report it anyway.

OS: MacOSX Python: 3.7 aiohttp: 3.6.2

bob-pasabi avatar Nov 06 '19 14:11 bob-pasabi

Seems like we have own datetime parser in cookiejar.py, the parser is used on the client side. If you want to fix parser's regexes -- you are welcome, please make a PR. Tests for additional formats are required. Otherwise, I doubt if somebody wants to pick the issue up. Speaking for myself, I don't want to invest my personal time into the support of very outdated specs.

asvetlov avatar Nov 07 '19 16:11 asvetlov

That's fair enough. As I say I have replaced the server code that was generating these cookies so I'm not in any rush personally to have it fixed.

These three various date formats are the current specs though not an old spec. Also I'm not sure what regexs you are referring to, as I see it the python standard library http.cookies parser needs to be replaced with a compliant parser; but there could be some other existing code in this lib im not aware of.

A library like https://gitlab.com/sashahart/cookies has already done the heavy work of creating a valid parser. Sadly this library is old and unmaintained for a long time, pull requests and questions ignored. But as a proof of concept you can do something horrible like the below change to show library code is working. I didn't make a pull request as I am in no way advocating this change, I just spend a bit of time seeing what was happening.

diff --git a/aiohttp/client_reqrep.py b/aiohttp/client_reqrep.py
index 3b74fde4..97a202a0 100644
--- a/aiohttp/client_reqrep.py
+++ b/aiohttp/client_reqrep.py
@@ -23,6 +23,8 @@ from typing import (  # noqa
 )

 import attr
+from cookies import Cookie  # type: ignore
+from cookies import CookieError as CustomCookieError  # type: ignore
 from multidict import CIMultiDict, CIMultiDictProxy, MultiDict, MultiDictProxy
 from yarl import URL

@@ -820,7 +822,11 @@ class ClientResponse(HeadersMixin):
         # cookies
         for hdr in self.headers.getall(hdrs.SET_COOKIE, ()):
             try:
-                self.cookies.load(hdr)
+                try:
+                    cookie = Cookie.from_string(hdr)
+                except CustomCookieError:
+                    raise CookieError()
+                self.cookies.load(cookie.render_response())
             except CookieError as exc:
                 client_logger.warning(
                     'Can not load response cookies: %s', exc)

With this change all formats in the spec work and all existing tests pass as I converted back to a normal SimpleCookie instance anyway.

I guess we could take the bulk of the parse code from this MIT licensed 3rd party cookies lib (there may be other alternatives) and make it return a SimpleCookie and use the python http.cookies exceptions to maintain compatibility and expectations of any code using cookiejar. Is that a strategy you would approve of?

bob-pasabi avatar Nov 11 '19 19:11 bob-pasabi

I thought about these regexps

If some awesome library for working with cookies already exists -- this is cool, we can consider its usage. Sadly the tool mentioned by you looks as not maintained anymore, I expect problems in the future by this.

Maybe the work is not that hard, fixing a few regular expressions do what do you need?

asvetlov avatar Nov 12 '19 11:11 asvetlov

Is #4493 a duplicate of this issue? (Found this after commenting over there.)

I've tried fixing my issue at https://github.com/aio-libs/aiohttp/blob/master/aiohttp/cookiejar.py#L48 like this:

    DATE_YEAR_RE = re.compile(r"-?(\d{2,4})")

But that didn't work. DATE_YEAR_RE is only used in _parse_date() and that doesn't seem to be called at all. I placed an exit(1) in it and my demo exited normally.

plotski avatar Oct 23 '20 17:10 plotski

The ANSI C format doesn't work in http.cookies, so a bug should be reported to cpython to fix that one:

>>> from http.cookies import SimpleCookie
>>> s = SimpleCookie()
>>> s.load("ansic=ansicvalue; expires=Fri 06 Nov 2020 08:49:37 GMT; Max-Age=31449600; Path=/")
>>> s
<SimpleCookie: >

Once working in SimpleCookie, it should work here, the regex looks like it would match correctly.

The RFC 850 one does work in SimpleCookie and just needs the regex updating as asvetlov mentioned.

Dreamsorcerer avatar Aug 18 '24 00:08 Dreamsorcerer