unit
unit copied to clipboard
Improved variable parsing.
Hi.
See the test in test/test_variables.py
.
check_variables("\"routes${}\"")
I think the function nxt_var_next_part
should have the same result both at $
and ${}
.
And the empty variable is meaningless.
- Eliminate the check.
if (end - p < 2) {
return NULL;
}
- Then I found ${} is OK in
nxt_var_next_part()
. We need to checklength == 0
. - I like the tricky in the current implementation.
end = p + bracket;
But I did a small rework to make the code more readable.
if (*p == '{') {
bracket = 1;
p++;
if (p == end) {
return NULL;
}
} else {
bracket = 0;
}
start = p;
length = 0;
while (p < end) {
c = *p++;
c = (u_char) (c | 0x20);
if ((c >= 'a' && c <= 'z') || c == '_') {
length++;
continue;
}
if (bracket && c == '}') {
bracket = 0;
break;
}
p--;
break;
}
if (bracket || length == 0) {
return NULL;
}
end = p;
Here's the patch.
deleted
BTW, no need to change test cases.
Add a ticket. Make the share
option support variables.
https://gist.github.com/hongzhidao/d9e8869198d4c2af408fe36d0d41e60a
I think variable
is suitable for action
use, now pass
is supported.
In the real-world, share
is also a common requirement. So I try to support it.
Please note these.
- I've tried to introduce
NXT_CONF_MAP_VAR
and do thenxt_var_compile()
innxt_conf_map_object()
, but it's not a good idea. -
nxt_http_route_action_create()
is worth to be refactored that we need to keep theaction
order in the whole Unit. That isreturn
,share
,proxy
, andpass
. I did a tiny rework to make the logic clearer. - Introduced
nxt_var_query_final
. This is similar tomd5
.md5_init
,md5_update
, andmd5_final
. But I can't understand thewaiting
in the query. I assumenxt_var_query()
always run successfully, I didn't handlefailed
yet. - tests are not added yet.
- May
location
is welcome to supportvariables
as well.
Welcome to your suggestion.
Variable: improved parsing in case of empty.
Updated. Added another two rework.
- Style in nxt_var.c.
- Fixed the allocation size of
var
indexes.
deleted
Updated. https://gist.github.com/hongzhidao/d9e8869198d4c2af408fe36d0d41e60a
- Variable: improved parsing in case of empty. (done)
- Refactored out nxt_var_info_t for introducing $var_name format. (done)
- Supported $arg_name, $header_name, $cookie_name variables. (draft)
- Make
share
supportvarialbles
. (draft)
@hongzhidao Could you elaborate how 2 should help to implement 3? These structures was made this way for performance reasons. It's better to walk over data that is placed sequentially in memory and have more chances to fit into cache line.
@VBart
First of all, I think there are forms of variables. One is simple such as $host, $uri, etc. The other is complex like $var_name.
These complex variables correspond to array, list, hash
structures in Unit code. So the char _
is good to show this form.
And non-HTTP-request has this form. I'd like to call the second part in $var_name as name
.
These structures was made this way for performance reasons.
Thanks for your remind, I also notice it.
- At first, I want to place all these like the following. |-- nxt_var_t --|-- indexes --|--positions --|-- names --|-- plain chars --| Since each index size is 32(uint32_t), we can use the high 16bits for var index, and the low 16bits for name index. But It'll make code complex before I know your intention.
- It seems
nxt_var_info_t
can make code simple if it's welcome. (ignore)
that is placed sequentially in memory and have more chances to fit into cache line.
I get it. Good idea.
@hongzhidao Let's first think about how we can make those $arg_*
, $http_*
, $cookie_*
magic variables the most efficient way.
And I suppose that the most significant efficiency we can achieve here if we won't lookup for the same header or the same argument twice or more, i.e. we will cache the result of the first lookup for the request lifetime.
Also we can pre-create the cache for number of needed elements during the configuration phase to avoid lookup for names. It can be just a static index with name values.
Since we have limited number of such magic variables (3 right now) than we can reserve high 2 bits in uint32_t index to identify those magic variables. The rest 30 bits will identify an index position related to the name part.
Then if we see those 2 bits set, then instead of looking for arbitrary variable handler, we call a special variable handler related to this magic variable and pass the rest 30 bits value to it.
This special variable handler look for in his own index at position set by those rest 30 bits for a variable value. If it contains NULL in nxt_str_t.start
then it looks for the name in static index that is constructed during configuration phase, and then it does a search for this name in arguments, cookies, or headers and initialize this position for found value or to an empty string if it's not found.
These indexes for $arg_*
, $http_*
, $cookie_*
then can be used for matching rules as well, so they need to be separated from the variables.
What do you think, does it make sense to you?
I suppose that the most significant efficiency we can achieve here if we won't lookup for the same header or the same argument twice or more
Agree. I'll reimplement it again.
BTW, I'm wondering if the variable
will be affected after introducing regex support
?
I don't know its usage.
@hongzhidao Regex support in its first implementation won't affect variables. Only pattern matching is affected. We will commit it soon (tomorrow probably).
@VBart
- Ready patches. https://gist.github.com/hongzhidao/67a410b3dcabd23791e63a48c47ee7cc
- Variable: improved parsing in case of empty.
- Improved route action absent.
rework
action_create
is to support theshare
variable upcoming.
- Speedup match.
does it make sense to you?
Yes.
These indexes for $arg_, $http_, $cookie_* then can be used for matching rules as well, so they need to be separated from the variables.
Very good reminder. And I'll try to do the route match
optimization first.
It seems all the queries($var, match) can be unified into index
(uint32_t).
@VBart
Take a look, please.
https://gist.github.com/hongzhidao/c9f1957386d180c1b330ee950bff42c1
I use match scheme
for experiments.
In my understanding:
we will cache the result of the first lookup for the request lifetime.
Yes.
There are two query forms. One is variable, the other is route matching. Both of them look like xxx_yyy, we can name it as var_name, now. So we need two indexes: var_index and name_index.
var_index
corresponds to nxt_var_hash
.
name_index
corresponds to route_conf->name_hash
. (May there are many name_hash).
we can reserve high 2 bits in uint32_t index to identify those magic variables.
Agree.
So there two APIs required with var_index
.
nxt_var_hash_index()
and nxt_var_index_get()
Because variable
support multiple variables, nxt_var_query()
is quite good now.
Still going on :)
@VBart https://gist.github.com/hongzhidao/0e8add2f2c1fb7a9472dd7b2ccb04aeb This is clearer.
I suppose that the most significant efficiency we can achieve here if we won't lookup for the same header or the same argument twice or more
We can reach it.
@VBart Could you help check the usage of the duplicate header?
Take a look at the example. http://hg.nginx.org/nginx-tests/file/tip/js_headers.t#l433
With -H "Foo: a" -H "Foo: b". In NGINX, the value of $http_foo is "a". In NJS, the value of $r.variables.foo is "a,b", it looks good.
In Unit, the route matching has already considered this situation. Which way will be accepted in Unit?
Thanks.
@hongzhidao I think for variables the NJS implementation is better.
@hongzhidao It looks like layering violation that you use *_var_*
functions for route matches. The mechanism for hashing args/headers/cookies should be more generic and separated from variables and routes. It's just some generic API that allows to fast cached lookups for header fields, cookies and argument in a request object. Then this API can be used in routes and for magic variables.
I think I've verified your suggestions are correct.
It looks like layering violation that you use var functions for route matches
I've found some better naming. Such as.
- router_conf
typedef struct {
nxt_mp_t *pool;
nxt_lvlhsh_t hash;
uint32_t count;
nxt_str_t *names;
} nxt_var_cache_t;
typedef struct {
...
nxt_var_cache_t var_cache;
} nxt_router_conf_t;
-
nxt_var_query_proto rename nxt_var_cache_proto as nxt_var_query_proto. I think we don't need to naming
cache
in nxt_var.c. The query has expressed its intent. -
var_param, var_index, param_index
But wait for the complete implementation :)
The mechanism for hashing args/headers/cookies should be more generic and separated from variables and routes.
Yes, I want to move them into nxt_http_request.c
.
I will complete these(route matching and variables) in the next step. Then we decide what needs to be separated/refactor.
I think for variables the NJS implementation is better.
'\t' is more suitable than ','. http://www.asciitable.com/
Here's a diff just for checking the idea. https://gist.github.com/hongzhidao/41a230d5cde3633380ae9368fde49a91
@VBart Unified route matching and variable lookup. https://gist.github.com/hongzhidao/dc37812019cee2ff5b7f5d2e290b366a
It's ready to review.
But nxt_entry_handler_t
is not clear, I changed a bit.
updated. https://gist.github.com/hongzhidao/c6aacdbf5d3a6ba8718e0956d4a530ac supported another format $headers['X-Some-Header']