unit
unit copied to clipboard
HTTP: refactored cookie parser.
As per https://datatracker.ietf.org/doc/html/rfc6265#section-4.1.1 cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE ) cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
So '=' is a valid character in cookie-value. The previous parser could not handle this situation. Now it supports parsing cookie-value which with '='.
Hi,
I suggest we change the summary to HTTP: fixed cookie parsing.
Hi, I suggest we change the summary to
HTTP: fixed cookie parsing.
https://github.com/changxiaocui/unit/commit/07dd99a679c62c7a0002cfc5decc9aa0def330a5
Hi,
static nxt_int_t
nxt_http_cookie_parse(nxt_array_t *cookies, u_char *start, const u_char *end)
{
size_t name_length;
u_char *p, *last, *name, *value;
nxt_http_name_value_t *nv;
name = NULL;
name_length = 0;
for (name = start; name < end; name = last + 1) {
last = nxt_memchr(name, ';', end - name);
if (last == NULL) {
last = end;
}
while (name < last && name[0] == ' ') { name++; }
value = nxt_memchr(name, '=', last - name);
if (value == NULL) {
value = last;
}
In my feelings, it's not a good idea to call nxt_memchr()
in the parsing since it adds more iterations.
And I think the previous loop for (...)
is correct and clear.
for (p = start; p < end; p++) {
...
}
What we need to do is fix the incorrect logic in for()
.
btw, we also plan to fix it in version 1.29.
This version has the best performance after tested multiple implementation versions. Also, the readability is better
deleted.
Note the tests can't pass in the current version, .... It depends rejecting or ignoring invalid cookie-name. I guess the failed case is
self.cookie('=foo=bar=baz', 404)
. Like theself.cookie('foo', 404) # 200 or 404?
, returning 200 or 400 is not hard to achive based on current version.
nxt_memchr is highly optimized and normal iteration in application layer is worse than it. you can verify this conclusion. @hongzhidao
returning 200 or 400 is not hard to achive based on current version.
I checked the existing test case, it should be 404. (@andrey-zelenkov, please check again)
the readability is better
In short, if the cookie value contains =
, there is an issue. Compare to the previous logic, the fixing logic is.
if (name != NULL) { /* add the condition to avoid parsing name again */
...
}
Fixing and optimization can be separated. Let's fix it first.
nxt_memchr is highly optimized and normal iteration in application layer is worse than it.
Could you share the reference?
The current version did the following:
- Remove leading and trailing spaces in cookie
- Allow cookie value contains =
- Keep all logic of cookie parsing in nxt_http_cookie_parse()
- Impove performance of the cookie parser
Besides, some cookie attribute like HttpOnly has no cookie value, so there is no =. Considering that cookie attribute is present in the response, we can reject it by
name_length = p - name;
if (name_length == 0 || value[0] != '=') {
return NXT_ERROR;
}
returning 200 or 400 is not hard to achive based on current version.
I checked the existing test case, it should be 404. (@andrey-zelenkov, please check again)
the readability is better
In short, if the cookie value contains
=
, there is an issue. Compare to the previous logic, the fixing logic is.if (name != NULL) { /* add the condition to avoid parsing name again */ ... }
Fixing and optimization can be separated. Let's fix it first.
nxt_memchr is highly optimized and normal iteration in application layer is worse than it.
Could you share the reference?
- Unit has skipped the around spaces.
- It's an issue that needs to be fixed.
- Sure, and keep it as simple as possible.
- I don't think it's worth to reimplement the whole function. If it has a performance problem, we should implement cached matching instead.
Here's the fixing patch.
diff -r 15275ad20e61 src/nxt_http_request.c
--- a/src/nxt_http_request.c Fri Sep 09 17:45:58 2022 +0400
+++ b/src/nxt_http_request.c Fri Sep 16 15:37:22 2022 +0800
@@ -1035,14 +1035,11 @@
for (p = start; p < end; p++) {
c = *p;
- if (c == '=') {
+ if (c == '=' && name == NULL) {
while (start[0] == ' ') { start++; }
name_length = p - start;
-
- if (name_length != 0) {
- name = start;
- }
+ name = start;
start = p + 1;
diff -r 15275ad20e61 test/test_routing.py
--- a/test/test_routing.py Fri Sep 09 17:45:58 2022 +0400
+++ b/test/test_routing.py Fri Sep 16 15:37:22 2022 +0800
@@ -1401,6 +1401,24 @@
self.route_match_invalid({"cookies": ["var"]})
self.route_match_invalid({"cookies": [{"foo": {}}]})
+ def test_routes_match_cookies_complex(self):
+ self.route_match({"cookies": {"foo": "bar=baz"}})
+
+ self.cookie('foo=bar=baz', 200)
+ self.cookie(' foo=bar=baz ', 200)
+ self.cookie('=foo=bar=baz', 404)
+ self.cookie('', 404)
+ self.cookie('=', 404)
+
+ self.route_match({"cookies": {"foo": ""}})
+ self.cookie('foo=', 200)
+ self.cookie('foo=;', 200)
+ self.cookie(' foo=;', 200)
+
+ self.cookie('foo', 404)
+ self.cookie('', 404)
+ self.cookie('=', 404)
+
def test_routes_match_cookies_multiple(self):
self.route_match({"cookies": {"foo": "bar", "blah": "blah"}})
- Logic is both complete and compact based on RFC
- Better performance and readability
The reasons stated above are sufficient.
And the last occurrence of the code below in previous version is confusing.
if (name != NULL) {
nv = nxt_http_cookie(cookies, name, name_length, start, p);
if (nxt_slow_path(nv == NULL)) {
return NXT_ERROR;
}
}
It's only a few lines of code after all. Anyway, It's up to you and unit.
And the last occurrence of the code below in previous version is confusing.
It's for a string that doesn't end with ;
.
And the last occurrence of the code below in previous version is confusing.
It's for a string that doesn't end with
;
.
I certainly know. But the flow of logic is interrupt. On this point, I agree with Linus (dragged to 14:30 https://www.youtube.com/watch?v=o8NPllzkFhE).
Actually, I have tried almost every method, including the one you pasted. After comparing multiple implementations, I think this is the best.
Since the current implementation is written by another team member. Here are my personal opinions.
Note I expose nxt_memchr()
as a simple loop.
void *
memchr (register const void *src_void, int c, size_t length)
{
const unsigned char *src = (const unsigned char *)src_void;
while (length-- > 0)
{
if (*src == c)
return (void *)src;
src++;
}
return NULL;
}
function parse1(...)
{
for (p = start; p < end; p++) {
c = *p;
if (c == '=' && name == NULL) {
while (start[0] == ' ') { start++; }
name_length = p - start;
name = start;
start = p + 1;
} else if (c == ';') {
if (name != NULL) {
nv = nxt_http_cookie(cookies, name, name_length, start, p);
if (nxt_slow_path(nv == NULL)) {
return NXT_ERROR;
}
}
name = NULL;
start = p + 1;
}
}
if (name != NULL) {
nv = nxt_http_cookie(cookies, name, name_length, start, p);
if (nxt_slow_path(nv == NULL)) {
return NXT_ERROR;
}
}
}
function parse2(...)
{
for (name = start; name < end; name = last + 1) {
for (last = name; last < end; last++) {
if (*last == ';') {
break;
}
}
while (name < last && name[0] == ' ') {
name++;
}
for (value = name; value < last; last++) {
if (*value == '=') {
break;
}
}
for (p = value; p > name && p[-1] == ' '; p--) { /* void */ }
name_length = p - name;
if (name_length > 0) {
p = last;
if (value < last) {
value++;
while (value < last && value[0] == ' ') {
value++;
}
while (p > value && p[-1] == ' ') {
p--;
}
}
nv = nxt_http_cookie(cookies, name, name_length, value, p);
if (nxt_slow_path(nv == NULL)) {
return NXT_ERROR;
}
}
}
}
It seems that it does loop twice for searching name and value.
I agree that the second nxt_http_cookie()
can be eliminated by some tricks. But considering it's clear and has no issue, I prefer to keep the main logic unchanged.
Unit has skipped the around spaces.
If so, my implementation could do better.
It seems that it does loop twice for searching name and value.
At first, I thought one loop is better. It turns out that glibc's memchr is faster, because it is highly optimized. Actually, the longer the string, the better the performance.
But considering it's clear and has no issue, I prefer to keep the main logic unchanged.
As I said before. It's only a few lines of code after all. Anyway, It's up to you and unit.
Unit has skipped the around spaces.
If so, my implementation could do better.
It seems that it does loop twice for searching name and value.
At first, I thought one loop is better. It turns out that glibc's memchr is faster, because it is highly optimized. Actually, the longer the string, the better the performance.
But considering it's clear and has no issue, I prefer to keep the main logic unchanged.
As I said before. It's only a few lines of code after all. Anyway, It's up to you and unit.
Thanks for the suggestion, we'll discuss it internally.
FWIW, I agree with @hongzhidao, we should start with the simple fix. Then we can look at dealing with any real performance or istylistic issues etc...
Hi @changxiaocui,
We are going to fix it. Here's the patch.
# HG changeset patch
# User Zhidao HONG <[email protected]>
# Date 1663526744 -28800
# Mon Sep 19 02:45:44 2022 +0800
# Node ID 6b6b979e821420bba206c717b68e0420d01ab010
# Parent 29b3edfb613db68a5f906162fdd9a6b34db8e20f
HTTP: fixed cookie parsing.
The fixing supports the cookie value with the '=' character.
This is related to #756 PR on Github.
Thanks to changxiaocui.
diff -r 29b3edfb613d -r 6b6b979e8214 docs/changes.xml
--- a/docs/changes.xml Mon Sep 19 11:59:59 2022 +0100
+++ b/docs/changes.xml Mon Sep 19 02:45:44 2022 +0800
@@ -31,6 +31,12 @@
date="" time=""
packager="Nginx Packaging <[email protected]>">
+<change type="bugfix">
+<para>
+fix HTTP cookie parsing when the value contains an equals sign.
+</para>
+</change>
+
</changes>
diff -r 29b3edfb613d -r 6b6b979e8214 src/nxt_http_request.c
--- a/src/nxt_http_request.c Mon Sep 19 11:59:59 2022 +0100
+++ b/src/nxt_http_request.c Mon Sep 19 02:45:44 2022 +0800
@@ -1035,14 +1035,11 @@
for (p = start; p < end; p++) {
c = *p;
- if (c == '=') {
+ if (c == '=' && name == NULL) {
while (start[0] == ' ') { start++; }
name_length = p - start;
-
- if (name_length != 0) {
- name = start;
- }
+ name = start;
start = p + 1;
diff -r 29b3edfb613d -r 6b6b979e8214 test/test_routing.py
--- a/test/test_routing.py Mon Sep 19 11:59:59 2022 +0100
+++ b/test/test_routing.py Mon Sep 19 02:45:44 2022 +0800
@@ -1401,6 +1401,20 @@
self.route_match_invalid({"cookies": ["var"]})
self.route_match_invalid({"cookies": [{"foo": {}}]})
+ def test_routes_match_cookies_complex(self):
+ self.route_match({"cookies": {"foo": "bar=baz"}})
+ self.cookie('foo=bar=baz', 200)
+ self.cookie(' foo=bar=baz ', 200)
+ self.cookie('=foo=bar=baz', 404)
+
+ self.route_match({"cookies": {"foo": ""}})
+ self.cookie('foo=', 200)
+ self.cookie('foo=;', 200)
+ self.cookie(' foo=;', 200)
+ self.cookie('foo', 404)
+ self.cookie('', 404)
+ self.cookie('=', 404)
+
def test_routes_match_cookies_multiple(self):
self.route_match({"cookies": {"foo": "bar", "blah": "blah"}})
Thanks for your report, and let me know if I didn't use your name correctly in the commit log.
Hi,
committed, thanks all! https://github.com/nginx/unit/commit/76df62a6236eba2ae1ea7ffe7b9599418b044a01
Let's just close this now. A fix was merged for the issue that inspired this PR...