`pool_max_idle_time` doesn't shut down empty pools
(Split this out of #309 )
Problem
As detailed of #309 (there is a script showcasing this) we have an issue where even with pool_max_idle_time set we still have process leakage. I just found exactly why that is, in the docs of NimblePool.handle_ping/2 there is a disclaimer:
On lazy pools, if no worker is currently on the pool the callback will never be called. Therefore you can not rely on this callback to terminate empty lazy pools.
Finch is using lazy pools, starting a bunch of empty ones in the beginning (depending on pool count). If they never get a connection, they'll never be shut down.
The relevant code in NimblePool it's here:
defp do_check_idle_resources(resources, now_in_ms, state, new_resources, remaining_pings) do
case :queue.out(resources) do
{:empty, _} ->
{:ok, new_resources, state}
This isn't a problem when you know where your HTTP requests go. In my use case, I can't know where my HTTP requests will go in advance, there are infinite possibilities, and I need my default pool to be decently large as some of these will have a lot of traffic.
Solutions
I'm not sure about the best solution here, here are some rough ideas:
- We could ask/contribute to
nimble_poolan option to shut down empty pools, then the pools would shut down - we could implement a strategy to shut down empty pools ourselves, feels a bit sad though (as NimblePool already has a
pingmechanic)
That said, with the current workings I'm also a bit worried about shutting down pools. If I'm not missing anything, let's say you configure a pool count of 15. You may shut down 14 pools via idle verification, one survives and now if a load spike comes that one pool has to deal with all the traffic as no new pools are started again. With a set count of pools in a... pool group (? not sure how to call it) it may be better to shut down all pools at once or none at all to avoid the scenario described.
That'd be different though if we used dynamic pool creation & closure - might open another issue about that though.
I'm happy to open an issue at NimblePool and suggest the change, but so far it seems very deliberate on their part. So, I'd first like to align on a way forward here.
As always, thank you a ton for providing us all with great and free open source software! π π
The only way I see to achieve this behaviour (ie. Idle termination of empty pools) would be with some pool idle callback such as I proposed here: https://github.com/dashbitco/nimble_pool/issues/46#issue-2628197378
At the time, nimble pool maintainers suggested to move forward with a client side implementation, with new use cases arising we may reopen the discussion over there what do you think?
Thanks as always @oliveigah ! Really appreciate the level of ownership you have shown around this idle termination stuff.
At work, we use Finch strictly for its original intended purpose of hammering a static set of hosts, so it will be hard for me to dedicate working hours in this area.
I agree, it seems like this is something that would only be possible through Finch and our PoolManager registry at this point in time and am open to exploring that area together.
Thank you for the words @sneako ! I'm happy to help, especially on the features (and bugs, haha) I've introduced in order to not make the maintenance even harder for you and others.
That said, I'm currently on vacation so I may not be able to take a serious look into this for the next few days. But will read through all @PragTob issues and contribute to the discussions whenever I can
As soon as I get back home (~10 days) I will be able to tackle some implementations more thoroughly
For now I think this specific issue (empty pool termination) could be solved by using the activity info data I'm introducing in the other PR.
But I agree that considering all issues and improvement proposals we may better set doing something more strategic on the pool manager, lets see.
@oliveigah thank you and have a great vacation it's important to recharge π΄ π
I also can't immediately promise work time, as the one service that runs into this dies a slooooowwww death right now so it isn't as important. On OSS time I need to get some benchee stuff done π But I'll see what I can do, this intrigues me and seems fun π (edit: meaning to indicate that it's also not super urgent over here)
Thanks both of you! π
I brought up this case in the NimblePool issue, but just to double check if my understanding is correct:
If we don't eagerly start all pools, but instead dynamically start them as proposed in https://github.com/sneako/finch/issues/313, we should never have an empty unknown pool, right?
If that's the case, it might be better to focus on dynamically starting pools instead of addressing this issue separately.
What do you all think?
@oliveigah That aligns with my understanding. I'm not sure about the "never have an empty pool" - I guess connections may be terminated/removed for other reasons but my finch knowledge isn't deep enough.
I think it's definitely the better feature, not sure if we can just focus on that as it'd change run time behavior. Like, not sure if eagerly starting all pools should be completely replaced with dynamic starts or that one should stay "broken" in combination with closing them not working.
I do agree though that dynamically starting them is probably what I'd want more as a feature and finch is still in 0.x :)
Youβre absolutely right! We can still have empty pools due to connection errors or other code paths that might terminate a worker.
What I meant to say is that, on the happy path, we should no longer have empty pools (assuming we implement the suggestion from #313). However, youβre correct that if something goes wrong and a connection is closed for some reason, the pool might become empty and, as a result, never be terminated. So the problem still exists, but only in the unhappy path.
The NimblePool maintainer has suggested a new generic handle_pool_info callback that may help us implement our own logic here to solve the unhappy path problem. Letβs see how that evolves so we can make use of it.
That said, I still believe that, for the happy path (which covers the majority of users), the improvement proposed in #313 might already solve this problem.
Hi @PragTob, I'm looking into this as well and noted that you wrote
This isn't a problem when you know where your HTTP requests go.
I gather that if you do know where your requests go, then you can create a specific pool for that since that pool won't be dynamic and therefore not subject to the limitations of dynamic pools. Did I understand you correctly there?
I gave the below configuration a try but I'm still having the requests failing due to the host disconnecting on me.
{Finch,
name: MyAppFinch,
pools: %{
:default => [],
"https://example.com" => [pool_max_idle_time: 60_000]
}}
I'm sorry in advance if I'm way off here of course!
π
Hey @carlgleisner - I don't think you even need to configure it separately since the default pool configuration will just be taken to create the pool for the one known URL in that case so you don't strictly need it. The issue here, iirc, is very much that we'd open new pools for all the different URLs we'd hit (sometimes they'd have tracking ids as part of the origin as well).
So yeah you shouldn't be running into that issue.
in your particular case, pool_max_idle_time isn't the configuration options you wanna look at - as that's how long the pool can/should be idle. You probably want to look at the connect options for individual connections - I know they exist but right now I don't know exactly where. Might be somewhere in connect_opts that get passed down to mint - but I'm on a train now and so can't do a lot of research. :)
On another note for this particular issue, I'm not working at the company where I encountered this issue any more so my incentives to go and fix it are sadly much lower now π
Many thanks for taking the time to reply @PragTob! ππ»