Cookies containing char '?' are not received correctly on Tomcat 7
I'm using mechanize for some automation purposes and noticed that a Cookie value is not correctly received on Tomcat 7.
Mechanize sends:
Cookie: COOKIE_NAME=/context/UI/Login?xyz=abcd
Tomcat 7 treats ? as a cookie separator while parsing and thus only receives COOKIE_NAME => /context/UI/Login.
Current browsers treat ? also as separator and send the cookie value quoted:
COOKIE_NAME="/context/UI/Login?xyz=abcd"
Mechanize/http-cookie only treats some control characters and ,;\ as delimiters to determine whether cookie values should be quoted: https://github.com/sparklemotion/http-cookie/blob/405a48bcb41b0a99dbd2386a7c217a280e958dff/lib/http/cookie/scanner.rb#L13
It seems the cookie handling is a complex topic and the delimiters are not clearly specified. When I look at Tomcat's cookie source code, they have different scenarios where they treat even more characters as delimiters (i.e. all HTTP RFC2616 token delimiters, which would include ?/(){} etc.)
I suggest we add these token delimiters in the RE_BAD_CHAR regexp so containing strings get quoted; I think it won't break things if we foresightfully add some more quotes (I don't see a case where additional quotes would cause a problem).
For now, I'm monkey patching the cookie library to work around this:
require 'mechanize'
HTTP::Cookie::Scanner::RE_BAD_CHAR = /([\x00-\x20\x7F",;\\\?])/
Thanks for your great work!
For reference, the URL CookieSupport.java code in latest tomcat is
- https://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/tomcat/util/http/Rfc6265CookieProcessor.java
- https://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/tomcat/util/http/LegacyCookieProcessor.java
I couldn't confirm the part about how browsers quote cookie values. As far as I could observe, if I set a cookie foo=bar?baz in Chrome and Firefox (both are the latest stable versions) they would send Cookie: foo=bar?baz; other_cookies.... There were no double quotes.
If I fed them foo="bar?baz" they'd send Cookie: foo="bar?baz"; other_cookies..., and if I fed them foo="bar; baz" then Cookie: foo="bar; other_cookies.... So, I could only conclude that they don't support double quotes at all. (!)
So, my question here is, doesn't the server actually send a Set-Cookie header with the value double-quoted? If that's the case, I guess the problem is in the parser that unquotes the double quoted value, not in the serializer to compose a Cookie header value.