unit icon indicating copy to clipboard operation
unit copied to clipboard

Improved variable parsing.

Open hongzhidao opened this issue 4 years ago • 19 comments

Hi.

See the test in test/test_variables.py.

check_variables("\"routes${}\"")

I think the function nxt_var_next_part should have the same result both at $ and ${}. And the empty variable is meaningless.

  1. Eliminate the check.
             if (end - p < 2) {
                 return NULL;
             }
  1. Then I found ${} is OK in nxt_var_next_part(). We need to check length == 0.
  2. I like the tricky in the current implementation.
end = p + bracket;

But I did a small rework to make the code more readable.

       if (*p == '{') {
            bracket = 1;

            p++;

            if (p == end) {
                return NULL;
            }

        } else {
            bracket = 0;
        }

        start = p;
        length = 0;

        while (p < end) {
            c = *p++;
            c = (u_char) (c | 0x20);

            if ((c >= 'a' && c <= 'z') || c == '_') {
                length++;
                continue;
            }

            if (bracket && c == '}') {
                bracket = 0;
                break;
            }

            p--;
            break;
        }

        if (bracket || length == 0) {
            return NULL;
        }

        end = p;

Here's the patch.

deleted

BTW, no need to change test cases.

hongzhidao avatar Nov 14 '20 12:11 hongzhidao

Add a ticket. Make the share option support variables. https://gist.github.com/hongzhidao/d9e8869198d4c2af408fe36d0d41e60a

I think variable is suitable for action use, now pass is supported. In the real-world, share is also a common requirement. So I try to support it.

Please note these.

  1. I've tried to introduce NXT_CONF_MAP_VAR and do the nxt_var_compile() in nxt_conf_map_object() , but it's not a good idea.
  2. nxt_http_route_action_create() is worth to be refactored that we need to keep the action order in the whole Unit. That is return, share, proxy, and pass. I did a tiny rework to make the logic clearer.
  3. Introduced nxt_var_query_final. This is similar to md5. md5_init, md5_update, and md5_final. But I can't understand the waiting in the query. I assume nxt_var_query() always run successfully, I didn't handle failed yet.
  4. tests are not added yet.
  5. May location is welcome to support variables as well.

Welcome to your suggestion.

hongzhidao avatar Nov 16 '20 02:11 hongzhidao

Variable: improved parsing in case of empty.

Updated. Added another two rework.

  1. Style in nxt_var.c.
  2. Fixed the allocation size of var indexes.
deleted

hongzhidao avatar Nov 16 '20 08:11 hongzhidao

Updated. https://gist.github.com/hongzhidao/d9e8869198d4c2af408fe36d0d41e60a

  1. Variable: improved parsing in case of empty. (done)
  2. Refactored out nxt_var_info_t for introducing $var_name format. (done)
  3. Supported $arg_name, $header_name, $cookie_name variables. (draft)
  4. Make share support varialbles. (draft)

hongzhidao avatar Nov 17 '20 09:11 hongzhidao

@hongzhidao Could you elaborate how 2 should help to implement 3? These structures was made this way for performance reasons. It's better to walk over data that is placed sequentially in memory and have more chances to fit into cache line.

VBart avatar Nov 17 '20 10:11 VBart

@VBart First of all, I think there are forms of variables. One is simple such as $host, $uri, etc. The other is complex like $var_name. These complex variables correspond to array, list, hash structures in Unit code. So the char _ is good to show this form. And non-HTTP-request has this form. I'd like to call the second part in $var_name as name.

These structures was made this way for performance reasons.

Thanks for your remind, I also notice it.

  1. At first, I want to place all these like the following. |-- nxt_var_t --|-- indexes --|--positions --|-- names --|-- plain chars --| Since each index size is 32(uint32_t), we can use the high 16bits for var index, and the low 16bits for name index. But It'll make code complex before I know your intention.
  2. It seems nxt_var_info_t can make code simple if it's welcome. (ignore)

that is placed sequentially in memory and have more chances to fit into cache line.

I get it. Good idea.

hongzhidao avatar Nov 17 '20 10:11 hongzhidao

@hongzhidao Let's first think about how we can make those $arg_*, $http_*, $cookie_* magic variables the most efficient way.

And I suppose that the most significant efficiency we can achieve here if we won't lookup for the same header or the same argument twice or more, i.e. we will cache the result of the first lookup for the request lifetime.

Also we can pre-create the cache for number of needed elements during the configuration phase to avoid lookup for names. It can be just a static index with name values.

Since we have limited number of such magic variables (3 right now) than we can reserve high 2 bits in uint32_t index to identify those magic variables. The rest 30 bits will identify an index position related to the name part.

Then if we see those 2 bits set, then instead of looking for arbitrary variable handler, we call a special variable handler related to this magic variable and pass the rest 30 bits value to it.

This special variable handler look for in his own index at position set by those rest 30 bits for a variable value. If it contains NULL in nxt_str_t.start then it looks for the name in static index that is constructed during configuration phase, and then it does a search for this name in arguments, cookies, or headers and initialize this position for found value or to an empty string if it's not found.

These indexes for $arg_*, $http_*, $cookie_* then can be used for matching rules as well, so they need to be separated from the variables.

VBart avatar Nov 17 '20 11:11 VBart

What do you think, does it make sense to you?

VBart avatar Nov 17 '20 11:11 VBart

I suppose that the most significant efficiency we can achieve here if we won't lookup for the same header or the same argument twice or more

Agree. I'll reimplement it again.

BTW, I'm wondering if the variable will be affected after introducing regex support? I don't know its usage.

hongzhidao avatar Nov 17 '20 11:11 hongzhidao

@hongzhidao Regex support in its first implementation won't affect variables. Only pattern matching is affected. We will commit it soon (tomorrow probably).

VBart avatar Nov 17 '20 12:11 VBart

@VBart

  1. Ready patches. https://gist.github.com/hongzhidao/67a410b3dcabd23791e63a48c47ee7cc
  • Variable: improved parsing in case of empty.
  • Improved route action absent. rework action_create is to support the share variable upcoming.
  1. Speedup match.

does it make sense to you?

Yes.

These indexes for $arg_, $http_, $cookie_* then can be used for matching rules as well, so they need to be separated from the variables.

Very good reminder. And I'll try to do the route match optimization first. It seems all the queries($var, match) can be unified into index(uint32_t).

hongzhidao avatar Nov 18 '20 02:11 hongzhidao

@VBart Take a look, please. https://gist.github.com/hongzhidao/c9f1957386d180c1b330ee950bff42c1 I use match scheme for experiments.

In my understanding:

we will cache the result of the first lookup for the request lifetime.

Yes.

There are two query forms. One is variable, the other is route matching. Both of them look like xxx_yyy, we can name it as var_name, now. So we need two indexes: var_index and name_index. var_index corresponds to nxt_var_hash. name_index corresponds to route_conf->name_hash. (May there are many name_hash).

we can reserve high 2 bits in uint32_t index to identify those magic variables.

Agree.

So there two APIs required with var_index. nxt_var_hash_index() and nxt_var_index_get()

Because variable support multiple variables, nxt_var_query() is quite good now.

Still going on :)

hongzhidao avatar Nov 18 '20 09:11 hongzhidao

@VBart https://gist.github.com/hongzhidao/0e8add2f2c1fb7a9472dd7b2ccb04aeb This is clearer.

I suppose that the most significant efficiency we can achieve here if we won't lookup for the same header or the same argument twice or more

We can reach it.

hongzhidao avatar Nov 19 '20 07:11 hongzhidao

@VBart Could you help check the usage of the duplicate header?

Take a look at the example. http://hg.nginx.org/nginx-tests/file/tip/js_headers.t#l433

With -H "Foo: a" -H "Foo: b". In NGINX, the value of $http_foo is "a". In NJS, the value of $r.variables.foo is "a,b", it looks good.

In Unit, the route matching has already considered this situation. Which way will be accepted in Unit?

Thanks.

hongzhidao avatar Nov 19 '20 09:11 hongzhidao

@hongzhidao I think for variables the NJS implementation is better.

VBart avatar Nov 19 '20 13:11 VBart

@hongzhidao It looks like layering violation that you use *_var_* functions for route matches. The mechanism for hashing args/headers/cookies should be more generic and separated from variables and routes. It's just some generic API that allows to fast cached lookups for header fields, cookies and argument in a request object. Then this API can be used in routes and for magic variables.

VBart avatar Nov 19 '20 13:11 VBart

I think I've verified your suggestions are correct.

It looks like layering violation that you use var functions for route matches

I've found some better naming. Such as.

  1. router_conf
typedef struct {
    nxt_mp_t           *pool;
    nxt_lvlhsh_t       hash;
    uint32_t           count;
    nxt_str_t          *names;
} nxt_var_cache_t;

typedef struct {
    ...
    nxt_var_cache_t          var_cache;
} nxt_router_conf_t;
  1. nxt_var_query_proto rename nxt_var_cache_proto as nxt_var_query_proto. I think we don't need to naming cache in nxt_var.c. The query has expressed its intent.

  2. var_param, var_index, param_index

But wait for the complete implementation :)

The mechanism for hashing args/headers/cookies should be more generic and separated from variables and routes.

Yes, I want to move them into nxt_http_request.c.

I will complete these(route matching and variables) in the next step. Then we decide what needs to be separated/refactor.

hongzhidao avatar Nov 19 '20 14:11 hongzhidao

I think for variables the NJS implementation is better.

'\t' is more suitable than ','. http://www.asciitable.com/

Here's a diff just for checking the idea. https://gist.github.com/hongzhidao/41a230d5cde3633380ae9368fde49a91

hongzhidao avatar Nov 23 '20 08:11 hongzhidao

@VBart Unified route matching and variable lookup. https://gist.github.com/hongzhidao/dc37812019cee2ff5b7f5d2e290b366a

It's ready to review. But nxt_entry_handler_t is not clear, I changed a bit.

hongzhidao avatar Nov 24 '20 08:11 hongzhidao

updated. https://gist.github.com/hongzhidao/c6aacdbf5d3a6ba8718e0956d4a530ac supported another format $headers['X-Some-Header']

hongzhidao avatar Nov 25 '20 04:11 hongzhidao