lua-nginx-module
lua-nginx-module copied to clipboard
feature: implemented dynamic keepalive connections pooling in 'balancer_by_lua*'.
Summary
This PR implements support for dynamic connection pools in NGINX's upstream {}
blocks, similarly to NGINX's keepalive directive, controlled by Lua APIs.
Background
This need for this feature was originally driven by NGINX limitations around upstream keepalive pools: when using the ngx_http_upstream_module's keepalive
directive, connections are reused solely based on their remote IP and remote port attributes, ignoring, for example, TLS attributes. The following tickets describe those limitations in more details, with example configurations to reproduce them:
- https://trac.nginx.org/nginx/ticket/1340
- https://trac.nginx.org/nginx/ticket/1593
These limitations can be addressed by defining several upstream {}
blocks, or by disabling upstream keepalive connections pooling altogether.
A PoC patch was developed last year to address this issue in Kong (https://github.com/Kong/openresty-patches/blob/be8b79bd3a97cd8d1d9fce2da70a2be0a57af7d6/patches/1.13.6.1/nginx-1.13.6_01-upstream-keepalive.patch) and has proven itself successful in fixing the limitations: https://github.com/Kong/openresty-patches/pull/40.
A subsequent and more complete patch was recently submitted to the nginx-devel mailing list, implementing a way to dynamically allocate upstream connection pools: http://mailman.nginx.org/pipermail/nginx-devel/2019-August/012580.html. Alas (and unsurprisingly), the patch was not welcomed.
Proposed API
Since it is typical for an OpenResty application to define a limited number of upstream {}
blocks that are shared between upstream services, this limitation should be overcome in OpenResty itself. The decision of implementing the connection pooling in this module has a few major benefits:
- It avoids having to write and maintain another NGINX core patch.
- It allows for dynamic and granular control over the upstream connection pools, which was not possible with the
keepalive
directive. - It makes the Lua balancer module better integrate with other stock NGINX upstream modules.
- It allows for specifying an indefinite idle timeout and max requests unlike the NGINX upstream keepalive module since 1.15.3.
Example usage:
balancer_by_lua_block {
local balancer = require "ngx.balancer"
local ok, err = balancer.set_current_peer("127.0.0.2", 8080)
if not ok then
ngx.log(ngx.ERR, "failed to set current peer: ", err)
return
end
-- default pool will be "host:port"
-- default pool_size will be 30
ok, err = balancer.enable_keepalive()
if not ok then
ngx.log(ngx.ERR, "failed to set keepalive: ", err)
return
end
}
Documentation
The documentation is available in the lua-resty-core sister PR: https://github.com/openresty/lua-resty-core/pull/276
Additional commits
This PR also contains a couple of other improvements:
- It removes the need for defining a stub
server
directive in theupstream {}
block. - It refactors the stashing of the balancer peer data structure to be more efficient and more intuitive (i.e. the pattern to retrieve the peer data is identical in the upstream keepalive module interface and in the FFI interface:
r->upstream->peer.data
).
Sister PRs
- https://github.com/openresty/lua-resty-core/pull/276
I hereby granted the copyright of the changes in this pull request to the authors of this lua-nginx-module project.
@agentzh @thibaultcha Hello, How is this PR going? I think this function is great, allowing us to use only one upstream block, but manage our own connection pool according to the requirements. And implement the scenario of multiple upstream blocks in Nginx.
If we are worried about the total connection pool or the number of connections, can we make a configuration limit like lua_max_pending_timers
?
In addition, I think that shutting down a pool by name and adjusting the size of the connection pool are also very important tools, let user manage all connection pool.
@rainingmaster We don't need multiple upstream {}
blocks for long. That's the very reason to introduce balancer_by_lua*
in the first place. The biggest limitation right now is we only respect socket addresses (instead of taking into account SNI and etc) when reusing a connection, which can be addressed relatively easily by patching or forking the existing ngx_http_upstream_keepalive
module. This PR tries to do a lot more, which is a bit overwhelming (at least to me).
@thibaultcha Shall we take smaller steps so that it's easier to review and also safer to incorporate? I appreciate your work here. Just want to be cautious and conservative for such relatively big things.
OK, I had a closer look. It looks good generally and we can have it if those remaining concerns are addressed.
I have the following questions or concerns:
- we need some kind of upper limit for the total number of connection pools. and once the limit is reached, we can use LRU to evict the least recently used ones to make room for the new pools.
- will the connection pools get automatically freed when there's no idle connections in them?
- what happens when there is a CRC32 collision? Is it possible that we misuse a connection? It's better if we also compare the socket address and etc to make sure. CRC32 is easy to get a collision.
- I don't think infinite max idle time is a good idea...
- Is the default pool size (30) configurable via some nginx directive?
@agentzh Ah, thanks! I split the commits up in different chunks, separated by features, so reviewing them individually will be easier. Now this PR is quite a bit old so I need to re-familiarize myself with it, but I recall reviewing it very extensively and feeling like it would overall greatly improve the realm of possibilities for dynamic load balancing in OpenResty rather than keep relying on the Nginx keepalive module, hence why the decision to implement our own keepalive module. It gives us extra flexibility for future use-cases of connection pools management, monitoring, etc...
- we need some kind of upper limit for the total number of connection pools
I can certainly add this, but I then wonder how come the cosocket API not suffer the same flaw then? ngx_http_lua_socket_tcp_create_socket_pool
is not bound to any upper limit.
- will the connection pools get automatically freed when there's no idle connections in them?
Yes, see https://github.com/openresty/lua-nginx-module/pull/1600/files#diff-998ae59f12e2d4f8f745561e5c824912R900-R902 and the test cases, e.g. this one: https://github.com/openresty/lua-resty-core/pull/276/files#diff-de1e53807cbda85df1ae20d983bd76efR805
- what happens when there is a CRC32 collision?
Currently the pools are stored in a Lua table stored in the Lua registry. I believe this is because I followed the approach taken in the cosocket implementation (see https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_socket_tcp.c#L360).
However, a better implementation of this would be using the Lua registry reference system instead (we have already been using it successfully since refactoring the Lua code cache system in https://github.com/openresty/lua-nginx-module/commit/39c2ead3cb7efae4456494ea27a348fa1eca6c5c). A Lua-land table stored as an upvalue in lib/ngx/balancer.lua
can then be a table whose keys are the pool names (as strings), and values are their associated reference to their Lua registry C pointers counterparts. This way, we even avoid having to compute any CRC32 hashes in the first place too.
- I don't think infinite max idle time is a good idea...
An indefinite max idle time has been the only used value for a long time, until the release of Nginx 1.15.3 in 2018. This commit whose intent is to fix a tangentially related issue, caused a feature regression here (and as I recall, Nginx was never actively closing its upstream connections before). While having a default upstream keepalive timeout of 60s is a good idea, the fix should also not prevent any potentially previously developed systems that may have been taking advantage of Nginx never actively closing upstream connections. As such, this PR stills proposes a default timeout of 60s, a max number of requests of 100 (https://github.com/openresty/lua-resty-core/pull/276/files#diff-973c4ef5939470d0a1c529e4e692db8eR106-R108), but allows for respecting the pre- Nginx 1.15.3 behavior of not actively closing connections.
Besides, the tcpsock:setkeepalive API also allows for an indefinite idle timeout; this PR intentionally unifies the user experience of the balancer and cosocket network APIs.
- Is the default pool size (30) configurable via some nginx directive?
It currently is a hard-coded Lua-land constant: https://github.com/openresty/lua-resty-core/pull/276/files#diff-973c4ef5939470d0a1c529e4e692db8eR106. Since the accent was put on making OpenResty more dynamic in this PR, I have been reticent at introducing new Nginx directives...
The reasoning being that either:
- one does not use the
balancer.enable_keepalive()
API, and isn't concerned with this default. - one uses the
balancer.enable_keepalive()
API, and if so may change the defaults by simply adding a newopts
table to theirbalancer.set_current_peer()
calls, in an area of their codebase related to the balancing code. Overall, I believe that configuring Nginx should be done innginx.conf
, and configuring OpenResty should be mostly done in Lua code, thus reducing the "sovereignty" that Nginx patterns have over OpenResty, and allowing developers to rely less and less over a staticnginx.conf
overtime... But if that is not the decision you wish to take going forward, I can add a directive here, it is a short and easy task indeed. My concern is more philosophical :)
@doujiang24 @spacewander hihi, could you help to have a look together. And I think we can support the connection limit to upstream(like backlog in ngx.tcp.socket) base on this feature in future, which in not support in open source nginx.
This pull request is now in conflict :(
- what happens when there is a CRC32 collision?
Currently the pools are stored in a Lua table stored in the Lua registry. I believe this is because I followed the approach taken in the cosocket implementation (see https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_socket_tcp.c#L360).
However, a better implementation of this would be using the Lua registry reference system instead (we have already been using it successfully since refactoring the Lua code cache system in 39c2ead). A Lua-land table stored as an upvalue in
lib/ngx/balancer.lua
can then be a table whose keys are the pool names (as strings), and values are their associated reference to their Lua registry C pointers counterparts. This way, we even avoid having to compute any CRC32 hashes in the first place too.
Hi, @thibaultcha, Is that any change mentions here have apply to kong's openresty patchs? I use kong 2.2.2, suffer a CRC32 collision in my envirentment. Details are here. Also, I have a question that why we use crc32
here, not just use the pool names as key? Is that in consideration of performance? Thanks.