unit HTTP: refactored cookie parser.

As per https://datatracker.ietf.org/doc/html/rfc6265#section-4.1.1 cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE ) cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E

So '=' is a valid character in cookie-value. The previous parser could not handle this situation. Now it supports parsing cookie-value which with '='.

Sep 16 '22 02:09 changxiaocui

Hi, I suggest we change the summary to HTTP: fixed cookie parsing.

Sep 16 '22 02:09 hongzhidao

Hi, I suggest we change the summary to HTTP: fixed cookie parsing.

https://github.com/changxiaocui/unit/commit/07dd99a679c62c7a0002cfc5decc9aa0def330a5

Sep 16 '22 03:09 changxiaocui

Hi,

static nxt_int_t
nxt_http_cookie_parse(nxt_array_t *cookies, u_char *start, const u_char *end)
{
    size_t                 name_length;
    u_char                 *p, *last, *name, *value;
    nxt_http_name_value_t  *nv;

    name = NULL;
    name_length = 0;

    for (name = start; name < end; name = last + 1) {
        last = nxt_memchr(name, ';', end - name);
        if (last == NULL) {
            last = end;
        }

        while (name < last && name[0] == ' ') { name++; }

        value = nxt_memchr(name, '=', last - name);
        if (value == NULL) {
            value = last;
        }

In my feelings, it's not a good idea to call nxt_memchr() in the parsing since it adds more iterations. And I think the previous loop for (...) is correct and clear.

for (p = start; p < end; p++) {
    ...
}

What we need to do is fix the incorrect logic in for().

btw, we also plan to fix it in version 1.29.

Sep 16 '22 03:09 hongzhidao

This version has the best performance after tested multiple implementation versions. Also, the readability is better

Sep 16 '22 04:09 changxiaocui

deleted.

Sep 16 '22 05:09 hongzhidao

Note the tests can't pass in the current version, .... It depends rejecting or ignoring invalid cookie-name. I guess the failed case is self.cookie('=foo=bar=baz', 404). Like the self.cookie('foo', 404) # 200 or 404?, returning 200 or 400 is not hard to achive based on current version.

nxt_memchr is highly optimized and normal iteration in application layer is worse than it. you can verify this conclusion. @hongzhidao

Sep 16 '22 05:09 changxiaocui

returning 200 or 400 is not hard to achive based on current version.

I checked the existing test case, it should be 404. (@andrey-zelenkov, please check again)

the readability is better

In short, if the cookie value contains =, there is an issue. Compare to the previous logic, the fixing logic is.

if (name != NULL) { /* add the condition to avoid parsing name again */
      ...
}

Fixing and optimization can be separated. Let's fix it first.

nxt_memchr is highly optimized and normal iteration in application layer is worse than it.

Could you share the reference?

Sep 16 '22 05:09 hongzhidao

The current version did the following:

Remove leading and trailing spaces in cookie
Allow cookie value contains =
Keep all logic of cookie parsing in nxt_http_cookie_parse()
Impove performance of the cookie parser

Besides, some cookie attribute like HttpOnly has no cookie value, so there is no =. Considering that cookie attribute is present in the response, we can reject it by

name_length = p - name;
if (name_length == 0 || value[0] != '=') {
    return NXT_ERROR;
}

Sep 16 '22 07:09 changxiaocui

returning 200 or 400 is not hard to achive based on current version.

I checked the existing test case, it should be 404. (@andrey-zelenkov, please check again)

the readability is better

In short, if the cookie value contains =, there is an issue. Compare to the previous logic, the fixing logic is.
if (name != NULL) { /* add the condition to avoid parsing name again */
      ...
}
Fixing and optimization can be separated. Let's fix it first.

nxt_memchr is highly optimized and normal iteration in application layer is worse than it.

Could you share the reference?

cookie_test.c.txt

Sep 16 '22 07:09 changxiaocui

Unit has skipped the around spaces.
It's an issue that needs to be fixed.
Sure, and keep it as simple as possible.
I don't think it's worth to reimplement the whole function. If it has a performance problem, we should implement cached matching instead.

Here's the fixing patch.

diff -r 15275ad20e61 src/nxt_http_request.c
--- a/src/nxt_http_request.c    Fri Sep 09 17:45:58 2022 +0400
+++ b/src/nxt_http_request.c    Fri Sep 16 15:37:22 2022 +0800
@@ -1035,14 +1035,11 @@
     for (p = start; p < end; p++) {
         c = *p;

-        if (c == '=') {
+        if (c == '=' && name == NULL) {
             while (start[0] == ' ') { start++; }

             name_length = p - start;
-
-            if (name_length != 0) {
-                name = start;
-            }
+            name = start;

             start = p + 1;

diff -r 15275ad20e61 test/test_routing.py
--- a/test/test_routing.py      Fri Sep 09 17:45:58 2022 +0400
+++ b/test/test_routing.py      Fri Sep 16 15:37:22 2022 +0800
@@ -1401,6 +1401,24 @@
         self.route_match_invalid({"cookies": ["var"]})
         self.route_match_invalid({"cookies": [{"foo": {}}]})

+    def test_routes_match_cookies_complex(self):
+        self.route_match({"cookies": {"foo": "bar=baz"}})
+
+        self.cookie('foo=bar=baz', 200)
+        self.cookie('   foo=bar=baz   ', 200)
+        self.cookie('=foo=bar=baz', 404)
+        self.cookie('', 404)
+        self.cookie('=', 404)
+
+        self.route_match({"cookies": {"foo": ""}})
+        self.cookie('foo=', 200)
+        self.cookie('foo=;', 200)
+        self.cookie(' foo=;', 200)
+
+        self.cookie('foo', 404)
+        self.cookie('', 404)
+        self.cookie('=', 404)
+
     def test_routes_match_cookies_multiple(self):
         self.route_match({"cookies": {"foo": "bar", "blah": "blah"}})

Sep 16 '22 07:09 hongzhidao

Logic is both complete and compact based on RFC
Better performance and readability

The reasons stated above are sufficient.

And the last occurrence of the code below in previous version is confusing.

 if (name != NULL) {
        nv = nxt_http_cookie(cookies, name, name_length, start, p);
        if (nxt_slow_path(nv == NULL)) {
            return NXT_ERROR;
        }
    }

It's only a few lines of code after all. Anyway, It's up to you and unit.

Sep 16 '22 08:09 changxiaocui

And the last occurrence of the code below in previous version is confusing.

It's for a string that doesn't end with ;.

Sep 16 '22 09:09 hongzhidao

And the last occurrence of the code below in previous version is confusing.

It's for a string that doesn't end with ;.

I certainly know. But the flow of logic is interrupt. On this point, I agree with Linus (dragged to 14:30 https://www.youtube.com/watch?v=o8NPllzkFhE).

Actually, I have tried almost every method, including the one you pasted. After comparing multiple implementations, I think this is the best.

Sep 16 '22 10:09 changxiaocui

Since the current implementation is written by another team member. Here are my personal opinions.

Note I expose nxt_memchr() as a simple loop.

void *
memchr (register const void *src_void, int c, size_t length)
{
  const unsigned char *src = (const unsigned char *)src_void;
  
  while (length-- > 0)
  {
    if (*src == c)
     return (void *)src;
    src++;
  }
  return NULL;
}

function parse1(...)
{
    for (p = start; p < end; p++) {
        c = *p;

        if (c == '=' && name == NULL) {
            while (start[0] == ' ') { start++; }

            name_length = p - start;
            name = start;

            start = p + 1;

        } else if (c == ';') {
            if (name != NULL) {
                nv = nxt_http_cookie(cookies, name, name_length, start, p);
                if (nxt_slow_path(nv == NULL)) {
                    return NXT_ERROR;
                }
            }
            
            name = NULL;
            start = p + 1;
        }
    }

    if (name != NULL) {
        nv = nxt_http_cookie(cookies, name, name_length, start, p);
        if (nxt_slow_path(nv == NULL)) {
            return NXT_ERROR;
        }
    }
}

function parse2(...)
{
    for (name = start; name < end; name = last + 1) {

        for (last = name; last < end; last++) {
            if (*last == ';') {
                break;
            }
        }

        while (name < last && name[0] == ' ') {
            name++;
        }

        for (value = name; value < last; last++) {
            if (*value == '=') {
                break;
            }
        }

        for (p = value; p > name && p[-1] == ' '; p--) { /* void */ }

        name_length = p - name;

        if (name_length > 0) {
            p = last;

            if (value < last) {
                value++;

                while (value < last && value[0] == ' ') {
                    value++;
                }

                while (p > value && p[-1] == ' ') {
                    p--;
                }
            }

            nv = nxt_http_cookie(cookies, name, name_length, value, p);
            if (nxt_slow_path(nv == NULL)) {
                return NXT_ERROR;
            }
        }
    }
}

It seems that it does loop twice for searching name and value.

I agree that the second nxt_http_cookie() can be eliminated by some tricks. But considering it's clear and has no issue, I prefer to keep the main logic unchanged.

Sep 16 '22 10:09 hongzhidao

Unit has skipped the around spaces.

If so, my implementation could do better.

It seems that it does loop twice for searching name and value.

At first, I thought one loop is better. It turns out that glibc's memchr is faster, because it is highly optimized. Actually, the longer the string, the better the performance.

But considering it's clear and has no issue, I prefer to keep the main logic unchanged.

As I said before. It's only a few lines of code after all. Anyway, It's up to you and unit.

Sep 16 '22 11:09 changxiaocui

Unit has skipped the around spaces.

If so, my implementation could do better.

It seems that it does loop twice for searching name and value.

At first, I thought one loop is better. It turns out that glibc's memchr is faster, because it is highly optimized. Actually, the longer the string, the better the performance.

But considering it's clear and has no issue, I prefer to keep the main logic unchanged.

As I said before. It's only a few lines of code after all. Anyway, It's up to you and unit.

Thanks for the suggestion, we'll discuss it internally.

Sep 16 '22 11:09 hongzhidao

FWIW, I agree with @hongzhidao, we should start with the simple fix. Then we can look at dealing with any real performance or istylistic issues etc...

Sep 16 '22 11:09 ac000

Hi @changxiaocui,

We are going to fix it. Here's the patch.

# HG changeset patch
# User Zhidao HONG <[email protected]>
# Date 1663526744 -28800
#      Mon Sep 19 02:45:44 2022 +0800
# Node ID 6b6b979e821420bba206c717b68e0420d01ab010
# Parent  29b3edfb613db68a5f906162fdd9a6b34db8e20f
HTTP: fixed cookie parsing.

The fixing supports the cookie value with the '=' character.

This is related to #756 PR on Github.
Thanks to changxiaocui.

diff -r 29b3edfb613d -r 6b6b979e8214 docs/changes.xml
--- a/docs/changes.xml  Mon Sep 19 11:59:59 2022 +0100
+++ b/docs/changes.xml  Mon Sep 19 02:45:44 2022 +0800
@@ -31,6 +31,12 @@
          date="" time=""
          packager="Nginx Packaging &lt;[email protected]&gt;">

+<change type="bugfix">
+<para>
+fix HTTP cookie parsing when the value contains an equals sign.
+</para>
+</change>
+
 </changes>

diff -r 29b3edfb613d -r 6b6b979e8214 src/nxt_http_request.c
--- a/src/nxt_http_request.c    Mon Sep 19 11:59:59 2022 +0100
+++ b/src/nxt_http_request.c    Mon Sep 19 02:45:44 2022 +0800
@@ -1035,14 +1035,11 @@
     for (p = start; p < end; p++) {
         c = *p;

-        if (c == '=') {
+        if (c == '=' && name == NULL) {
             while (start[0] == ' ') { start++; }

             name_length = p - start;
-
-            if (name_length != 0) {
-                name = start;
-            }
+            name = start;

             start = p + 1;

diff -r 29b3edfb613d -r 6b6b979e8214 test/test_routing.py
--- a/test/test_routing.py      Mon Sep 19 11:59:59 2022 +0100
+++ b/test/test_routing.py      Mon Sep 19 02:45:44 2022 +0800
@@ -1401,6 +1401,20 @@
         self.route_match_invalid({"cookies": ["var"]})
         self.route_match_invalid({"cookies": [{"foo": {}}]})

+    def test_routes_match_cookies_complex(self):
+        self.route_match({"cookies": {"foo": "bar=baz"}})
+        self.cookie('foo=bar=baz', 200)
+        self.cookie('   foo=bar=baz   ', 200)
+        self.cookie('=foo=bar=baz', 404)
+
+        self.route_match({"cookies": {"foo": ""}})
+        self.cookie('foo=', 200)
+        self.cookie('foo=;', 200)
+        self.cookie('  foo=;', 200)
+        self.cookie('foo', 404)
+        self.cookie('', 404)
+        self.cookie('=', 404)
+
     def test_routes_match_cookies_multiple(self):
         self.route_match({"cookies": {"foo": "bar", "blah": "blah"}})

Thanks for your report, and let me know if I didn't use your name correctly in the commit log.

Sep 21 '22 02:09 hongzhidao

Hi,

committed, thanks all! https://github.com/nginx/unit/commit/76df62a6236eba2ae1ea7ffe7b9599418b044a01

Sep 22 '22 09:09 hongzhidao

Let's just close this now. A fix was merged for the issue that inspired this PR...

Nov 23 '23 02:11 ac000

unit unit copied to clipboard

HTTP: refactored cookie parser.

unit
unit copied to clipboard