aiohttp
aiohttp copied to clipboard
Compliance with rfc6265 for cookie expires dates
Long story short
Compliance with rfc6265 for cookie expires functionality. Took a while yesterday trying to work out why some client code wasn't working correctly, eventually tracked it down to cookie handling in aiohttp; well in pythons http.cookies. The server was returning an old but should be valid date format.
Expected behaviour
The specific problems I had are related to expires. Digging through the rfc's my current understanding is that expires as defined in rfc6265 states that the date format should be a sane-cookie-date as described in https://tools.ietf.org/html/rfc2616#section-3.3.1
Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123
Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036
Sun Nov 6 08:49:37 1994 ; ANSI C's asctime() format
Problem is that aiohttp doesn't handle all these options and it appears to strip cookies that have two of these formats.
Steps to reproduce
import sys
import http.server
import socketserver
class MainHandler(http.server.BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.send_header('Set-Cookie', 'verify1=verify1; Path=/;')
self.send_header('Set-Cookie', 'rfc822=rfc822value; expires=Fri, 06 Nov 2020 08:49:37 GMT; Max-Age=31449600; Path=/;')
self.send_header('Set-Cookie', 'rfc850=rfc850value; expires=Friday, 06-Nov-20 08:49:37 GMT; Max-Age=31449600; Path=/;')
self.send_header('Set-Cookie', 'ansic=ansicvalue; expires=Fri Nov 6 08:49:37 2020; Max-Age=31449600; Path=/;')
self.send_header('Set-Cookie', 'common=commonvalue; expires=Fri, 06-Nov-2020 16:01:51 GMT; Max-Age=31449600; Path=/;')
self.send_header('Set-Cookie', 'expired=expiredvalue; expires=Fri, 06 Nov 1999 14:18:35 GMT; Path=/;')
self.end_headers()
self.wfile.write(b'cookies!')
def main(port):
with socketserver.TCPServer(("", port), MainHandler) as httpd:
print("serving at port", port)
httpd.serve_forever()
if __name__ == '__main__':
main(int(sys.argv[1]))
This is a simple python server to return the different values, I included the common type with 4 digit years as everywhere seems to use it.
If I run a request with aiohttp
import aiohttp
import asyncio
async def main():
async with aiohttp.ClientSession() as session:
async with session.get('http://localhost:8000') as response:
await response.text()
for c in response.cookies:
print(c)
print('in session')
for c in session.cookie_jar:
print(c)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Actual behaviour
I will get output along the lines of
verify1
rfc822
common
expired
in session
Set-Cookie: verify1=verify1; Domain=localhost; Path=/
Set-Cookie: rfc822=rfc822value; Domain=localhost; expires=Fri, 06 Nov 2020 08:49:37 GMT; Max-Age=31449600; Path=/
Set-Cookie: common=commonvalue; Domain=localhost; expires=Fri, 06-Nov-2020 16:01:51 GMT; Max-Age=31449600; Path=/
rfc850 and ansic dates are excluded during response parsing.
I'm not sure if supporting these formats is important to you, I'm going to get the server we are connecting to to switch formats aliviating the problem for me specifically. Just thought I would report it anyway.
OS: MacOSX Python: 3.7 aiohttp: 3.6.2
Seems like we have own datetime parser in cookiejar.py, the parser is used on the client side. If you want to fix parser's regexes -- you are welcome, please make a PR. Tests for additional formats are required. Otherwise, I doubt if somebody wants to pick the issue up. Speaking for myself, I don't want to invest my personal time into the support of very outdated specs.
That's fair enough. As I say I have replaced the server code that was generating these cookies so I'm not in any rush personally to have it fixed.
These three various date formats are the current specs though not an old spec. Also I'm not sure what regexs you are referring to, as I see it the python standard library http.cookies parser needs to be replaced with a compliant parser; but there could be some other existing code in this lib im not aware of.
A library like https://gitlab.com/sashahart/cookies has already done the heavy work of creating a valid parser. Sadly this library is old and unmaintained for a long time, pull requests and questions ignored. But as a proof of concept you can do something horrible like the below change to show library code is working. I didn't make a pull request as I am in no way advocating this change, I just spend a bit of time seeing what was happening.
diff --git a/aiohttp/client_reqrep.py b/aiohttp/client_reqrep.py
index 3b74fde4..97a202a0 100644
--- a/aiohttp/client_reqrep.py
+++ b/aiohttp/client_reqrep.py
@@ -23,6 +23,8 @@ from typing import ( # noqa
)
import attr
+from cookies import Cookie # type: ignore
+from cookies import CookieError as CustomCookieError # type: ignore
from multidict import CIMultiDict, CIMultiDictProxy, MultiDict, MultiDictProxy
from yarl import URL
@@ -820,7 +822,11 @@ class ClientResponse(HeadersMixin):
# cookies
for hdr in self.headers.getall(hdrs.SET_COOKIE, ()):
try:
- self.cookies.load(hdr)
+ try:
+ cookie = Cookie.from_string(hdr)
+ except CustomCookieError:
+ raise CookieError()
+ self.cookies.load(cookie.render_response())
except CookieError as exc:
client_logger.warning(
'Can not load response cookies: %s', exc)
With this change all formats in the spec work and all existing tests pass as I converted back to a normal SimpleCookie instance anyway.
I guess we could take the bulk of the parse code from this MIT licensed 3rd party cookies lib (there may be other alternatives) and make it return a SimpleCookie and use the python http.cookies exceptions to maintain compatibility and expectations of any code using cookiejar. Is that a strategy you would approve of?
I thought about these regexps
If some awesome library for working with cookies already exists -- this is cool, we can consider its usage. Sadly the tool mentioned by you looks as not maintained anymore, I expect problems in the future by this.
Maybe the work is not that hard, fixing a few regular expressions do what do you need?
Is #4493 a duplicate of this issue? (Found this after commenting over there.)
I've tried fixing my issue at https://github.com/aio-libs/aiohttp/blob/master/aiohttp/cookiejar.py#L48 like this:
DATE_YEAR_RE = re.compile(r"-?(\d{2,4})")
But that didn't work. DATE_YEAR_RE is only used in _parse_date() and that doesn't seem to be called at all. I placed an exit(1) in it and my demo exited normally.
The ANSI C format doesn't work in http.cookies, so a bug should be reported to cpython to fix that one:
>>> from http.cookies import SimpleCookie
>>> s = SimpleCookie()
>>> s.load("ansic=ansicvalue; expires=Fri 06 Nov 2020 08:49:37 GMT; Max-Age=31449600; Path=/")
>>> s
<SimpleCookie: >
Once working in SimpleCookie, it should work here, the regex looks like it would match correctly.
The RFC 850 one does work in SimpleCookie and just needs the regex updating as asvetlov mentioned.