Improved variable parsing.
Hi.
See the test in test/test_variables.py.
check_variables("\"routes${}\"")
I think the function nxt_var_next_part should have the same result both at $ and ${}.
And the empty variable is meaningless.
- Eliminate the check.
if (end - p < 2) {
return NULL;
}
- Then I found ${} is OK in
nxt_var_next_part(). We need to checklength == 0. - I like the tricky in the current implementation.
end = p + bracket;
But I did a small rework to make the code more readable.
if (*p == '{') {
bracket = 1;
p++;
if (p == end) {
return NULL;
}
} else {
bracket = 0;
}
start = p;
length = 0;
while (p < end) {
c = *p++;
c = (u_char) (c | 0x20);
if ((c >= 'a' && c <= 'z') || c == '_') {
length++;
continue;
}
if (bracket && c == '}') {
bracket = 0;
break;
}
p--;
break;
}
if (bracket || length == 0) {
return NULL;
}
end = p;
Here's the patch.
deleted
BTW, no need to change test cases.
Add a ticket. Make the share option support variables.
https://gist.github.com/hongzhidao/d9e8869198d4c2af408fe36d0d41e60a
I think variable is suitable for action use, now pass is supported.
In the real-world, share is also a common requirement. So I try to support it.
Please note these.
- I've tried to introduce
NXT_CONF_MAP_VARand do thenxt_var_compile()innxt_conf_map_object(), but it's not a good idea. nxt_http_route_action_create()is worth to be refactored that we need to keep theactionorder in the whole Unit. That isreturn,share,proxy, andpass. I did a tiny rework to make the logic clearer.- Introduced
nxt_var_query_final. This is similar tomd5.md5_init,md5_update, andmd5_final. But I can't understand thewaitingin the query. I assumenxt_var_query()always run successfully, I didn't handlefailedyet. - tests are not added yet.
- May
locationis welcome to supportvariablesas well.
Welcome to your suggestion.
Variable: improved parsing in case of empty.
Updated. Added another two rework.
- Style in nxt_var.c.
- Fixed the allocation size of
varindexes.
deleted
Updated. https://gist.github.com/hongzhidao/d9e8869198d4c2af408fe36d0d41e60a
- Variable: improved parsing in case of empty. (done)
- Refactored out nxt_var_info_t for introducing $var_name format. (done)
- Supported $arg_name, $header_name, $cookie_name variables. (draft)
- Make
sharesupportvarialbles. (draft)
@hongzhidao Could you elaborate how 2 should help to implement 3? These structures was made this way for performance reasons. It's better to walk over data that is placed sequentially in memory and have more chances to fit into cache line.
@VBart
First of all, I think there are forms of variables. One is simple such as $host, $uri, etc. The other is complex like $var_name.
These complex variables correspond to array, list, hash structures in Unit code. So the char _ is good to show this form.
And non-HTTP-request has this form. I'd like to call the second part in $var_name as name.
These structures was made this way for performance reasons.
Thanks for your remind, I also notice it.
- At first, I want to place all these like the following. |-- nxt_var_t --|-- indexes --|--positions --|-- names --|-- plain chars --| Since each index size is 32(uint32_t), we can use the high 16bits for var index, and the low 16bits for name index. But It'll make code complex before I know your intention.
- It seems
nxt_var_info_tcan make code simple if it's welcome. (ignore)
that is placed sequentially in memory and have more chances to fit into cache line.
I get it. Good idea.
@hongzhidao Let's first think about how we can make those $arg_*, $http_*, $cookie_* magic variables the most efficient way.
And I suppose that the most significant efficiency we can achieve here if we won't lookup for the same header or the same argument twice or more, i.e. we will cache the result of the first lookup for the request lifetime.
Also we can pre-create the cache for number of needed elements during the configuration phase to avoid lookup for names. It can be just a static index with name values.
Since we have limited number of such magic variables (3 right now) than we can reserve high 2 bits in uint32_t index to identify those magic variables. The rest 30 bits will identify an index position related to the name part.
Then if we see those 2 bits set, then instead of looking for arbitrary variable handler, we call a special variable handler related to this magic variable and pass the rest 30 bits value to it.
This special variable handler look for in his own index at position set by those rest 30 bits for a variable value. If it contains NULL in nxt_str_t.start then it looks for the name in static index that is constructed during configuration phase, and then it does a search for this name in arguments, cookies, or headers and initialize this position for found value or to an empty string if it's not found.
These indexes for $arg_*, $http_*, $cookie_* then can be used for matching rules as well, so they need to be separated from the variables.
What do you think, does it make sense to you?
I suppose that the most significant efficiency we can achieve here if we won't lookup for the same header or the same argument twice or more
Agree. I'll reimplement it again.
BTW, I'm wondering if the variable will be affected after introducing regex support?
I don't know its usage.
@hongzhidao Regex support in its first implementation won't affect variables. Only pattern matching is affected. We will commit it soon (tomorrow probably).
@VBart
- Ready patches. https://gist.github.com/hongzhidao/67a410b3dcabd23791e63a48c47ee7cc
- Variable: improved parsing in case of empty.
- Improved route action absent.
rework
action_createis to support thesharevariable upcoming.
- Speedup match.
does it make sense to you?
Yes.
These indexes for $arg_, $http_, $cookie_* then can be used for matching rules as well, so they need to be separated from the variables.
Very good reminder. And I'll try to do the route match optimization first.
It seems all the queries($var, match) can be unified into index(uint32_t).
@VBart
Take a look, please.
https://gist.github.com/hongzhidao/c9f1957386d180c1b330ee950bff42c1
I use match scheme for experiments.
In my understanding:
we will cache the result of the first lookup for the request lifetime.
Yes.
There are two query forms. One is variable, the other is route matching. Both of them look like xxx_yyy, we can name it as var_name, now. So we need two indexes: var_index and name_index.
var_index corresponds to nxt_var_hash.
name_index corresponds to route_conf->name_hash. (May there are many name_hash).
we can reserve high 2 bits in uint32_t index to identify those magic variables.
Agree.
So there two APIs required with var_index.
nxt_var_hash_index() and nxt_var_index_get()
Because variable support multiple variables, nxt_var_query() is quite good now.
Still going on :)
@VBart https://gist.github.com/hongzhidao/0e8add2f2c1fb7a9472dd7b2ccb04aeb This is clearer.
I suppose that the most significant efficiency we can achieve here if we won't lookup for the same header or the same argument twice or more
We can reach it.
@VBart Could you help check the usage of the duplicate header?
Take a look at the example. http://hg.nginx.org/nginx-tests/file/tip/js_headers.t#l433
With -H "Foo: a" -H "Foo: b". In NGINX, the value of $http_foo is "a". In NJS, the value of $r.variables.foo is "a,b", it looks good.
In Unit, the route matching has already considered this situation. Which way will be accepted in Unit?
Thanks.
@hongzhidao I think for variables the NJS implementation is better.
@hongzhidao It looks like layering violation that you use *_var_* functions for route matches. The mechanism for hashing args/headers/cookies should be more generic and separated from variables and routes. It's just some generic API that allows to fast cached lookups for header fields, cookies and argument in a request object. Then this API can be used in routes and for magic variables.
I think I've verified your suggestions are correct.
It looks like layering violation that you use var functions for route matches
I've found some better naming. Such as.
- router_conf
typedef struct {
nxt_mp_t *pool;
nxt_lvlhsh_t hash;
uint32_t count;
nxt_str_t *names;
} nxt_var_cache_t;
typedef struct {
...
nxt_var_cache_t var_cache;
} nxt_router_conf_t;
-
nxt_var_query_proto rename nxt_var_cache_proto as nxt_var_query_proto. I think we don't need to naming
cachein nxt_var.c. The query has expressed its intent. -
var_param, var_index, param_index
But wait for the complete implementation :)
The mechanism for hashing args/headers/cookies should be more generic and separated from variables and routes.
Yes, I want to move them into nxt_http_request.c.
I will complete these(route matching and variables) in the next step. Then we decide what needs to be separated/refactor.
I think for variables the NJS implementation is better.
'\t' is more suitable than ','. http://www.asciitable.com/
Here's a diff just for checking the idea. https://gist.github.com/hongzhidao/41a230d5cde3633380ae9368fde49a91
@VBart Unified route matching and variable lookup. https://gist.github.com/hongzhidao/dc37812019cee2ff5b7f5d2e290b366a
It's ready to review.
But nxt_entry_handler_t is not clear, I changed a bit.
updated. https://gist.github.com/hongzhidao/c6aacdbf5d3a6ba8718e0956d4a530ac supported another format $headers['X-Some-Header']