HikariCP icon indicating copy to clipboard operation
HikariCP copied to clipboard

Why is the minimum keep alive 30 seconds?

Open Randgalt opened this issue 1 year ago • 17 comments

Why is the minimum keep alive 30 seconds?

This seems like an arbitrary number and quite high. Why not allow smaller times?

Randgalt avatar Sep 19 '23 10:09 Randgalt

I would ask the opposite question, do you have a use-case where you would need a shorter time than that?

The only reason you would need a keep alive under 30 seconds would be if your database server closes connections automatically that have been quiet for 30 seconds, and this seems like an unlikely situation.

lfbayer avatar Sep 19 '23 11:09 lfbayer

Yes, we would like to have a 15 second keep alive. Yes, our DB is currently configured to close connections every 15 seconds in some circumstances.

Randgalt avatar Sep 19 '23 11:09 Randgalt

But why would you have such a configuration. What problem does that solve?

This context would be helpful to be able to understand what actually needs exist. It seems like if your database closes connections that quickly you wouldn’t really want to use a keepalive in the first place. Would it make more sense to increase the idle timeout in the database for this known application rather than require the application to send otherwise unnecessary traffic just to keep the connection open.

If the short timeout is to avoid leaked connections, surely you don’t have that problem with the application running HikariCP.

On Tue, Sep 19, 2023 at 8:20 PM Jordan Zimmerman @.***> wrote:

Yes, we would like to have a 15 second keep alive. Yes, our DB is currently configured to close connections every 15 seconds in some circumstances.

— Reply to this email directly, view it on GitHub https://github.com/brettwooldridge/HikariCP/issues/2112#issuecomment-1725308968, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHTCJFJPPNFDADOWCHQCB3X3F5YXANCNFSM6AAAAAA46DAHKE . You are receiving this because you commented.Message ID: @.***>

lfbayer avatar Sep 19 '23 12:09 lfbayer

I promise you we have this situation. I'm not sure what benefit it would be to explain our entire application. It's a simple request: please lower the minimum allowed keep alive. It's one line of code for you and really helps us out. If you can't do that, that's too bad.

This context would be helpful to be able to understand what actually needs exist

We've spent weeks with our DB supplier working this out. Honestly, I'm not up for re-litigating this. I hope you can trust that we have good reason for the request.

Randgalt avatar Sep 19 '23 12:09 Randgalt

I apologize if my tone sounded argumentative. That was not intentional. I just feel a responsibility to understand the underlying use cases before making changes that could lead to problems for other users.

I genuinely want to understand what the reasons are for having such a short idle timeout while still not wanting to close the connections. This knowledge can help me understand if there are other potential changes that would be reasonable as well as to be able to understand the risks more.

It would also be helpful to have a pull request with the change and relevant unit tests so that we can feel more comfortable that we aren’t breaking anything.

On Tue, Sep 19, 2023 at 9:55 PM Jordan Zimmerman @.***> wrote:

I promise you we have this situation. I'm not sure what benefit it would be to explain our entire application. It's a simple request: please lower the minimum allowed keep alive. It's one line of code for you and really helps us out. If you can't do that, that's too bad.

This context would be helpful to be able to understand what actually needs exist

We've spent weeks with our DB supplier working this out. Honestly, I'm not up for re-litigating this. I hope you can trust that we have good reason for the request.

— Reply to this email directly, view it on GitHub https://github.com/brettwooldridge/HikariCP/issues/2112#issuecomment-1725460322, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHTCJHDKPHLDDSDCFUCYJLX3GI3VANCNFSM6AAAAAA46DAHKE . You are receiving this because you commented.Message ID: @.***>

lfbayer avatar Sep 19 '23 13:09 lfbayer

And to answer your first question, I don’t know why that minimum is there. It might be intentional or it might have been arbitrary. I would have to investigate further to know.

On Tue, Sep 19, 2023 at 10:10 PM Leo Bayer @.***> wrote:

I apologize if my tone sounded argumentative. That was not intentional. I just feel a responsibility to understand the underlying use cases before making changes that could lead to problems for other users.

I genuinely want to understand what the reasons are for having such a short idle timeout while still not wanting to close the connections. This knowledge can help me understand if there are other potential changes that would be reasonable as well as to be able to understand the risks more.

It would also be helpful to have a pull request with the change and relevant unit tests so that we can feel more comfortable that we aren’t breaking anything.

On Tue, Sep 19, 2023 at 9:55 PM Jordan Zimmerman @.***> wrote:

I promise you we have this situation. I'm not sure what benefit it would be to explain our entire application. It's a simple request: please lower the minimum allowed keep alive. It's one line of code for you and really helps us out. If you can't do that, that's too bad.

This context would be helpful to be able to understand what actually needs exist

We've spent weeks with our DB supplier working this out. Honestly, I'm not up for re-litigating this. I hope you can trust that we have good reason for the request.

— Reply to this email directly, view it on GitHub https://github.com/brettwooldridge/HikariCP/issues/2112#issuecomment-1725460322, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHTCJHDKPHLDDSDCFUCYJLX3GI3VANCNFSM6AAAAAA46DAHKE . You are receiving this because you commented.Message ID: @.***>

lfbayer avatar Sep 19 '23 13:09 lfbayer

I'm looking at the code now and I don't actually see any enforcement of a minimum keepalive time. In fact the unit test for keepalive time actually uses 500ms as the test value. Where are you seeing a minimum of 30 seconds?

I do see that the housekeeping thread only runs once every 30 seconds, but that code doesn't do anything with keepalive.

lfbayer avatar Sep 19 '23 14:09 lfbayer

After investigating even more and actually trying to set the value to something less than 30 seconds I found where the check is. I also looked over the original pull request, and have come to the determination that I still do not know why there is a minimum and nothing in the original pull request seems to shed any light on it.

I'm not completely against lowering that minimum check, but I would like to know the specifics of the situation that actually requires such a short idle timeout.

lfbayer avatar Sep 19 '23 15:09 lfbayer

I apologize if my tone sounded argumentative.

No worries - thank you for your time

Randgalt avatar Sep 21 '23 06:09 Randgalt

I'm not completely against lowering that minimum check, but I would like to know the specifics of the situation that actually requires such a short idle timeout.

We are going to change our timeout to 30s. But, for the future I'll try to put together a PR with a proposed change and an explanation of what we're trying to accomplish.

Thank you.

Randgalt avatar Sep 21 '23 06:09 Randgalt

Except if there is a reason for it, I think the lib should just let the user sets the value it wants. Its pretty sneaky to disable keepalive when below 30s. A use case: fast detection of writer becoming a reader, without relying on max lifetime. (ok it relies on the fact that the keep alive task uses the validation query AND evicts the connection on failure, so maybe thats also a little sneaky on my part).

mathieufortin01 avatar Oct 04 '23 18:10 mathieufortin01

How about giving a warning log when keepaliveTime is set below 30 or a number ??

gary258796 avatar Nov 16 '23 15:11 gary258796

I haven't really investigated this yet, but I'm concerned about how the scheduling of the housekeeper thread works. It seems plausible that even with a setting below 30, if the housekeeper doesn't run it won't matter. So if the housekeeper is running in increments of 30 seconds, then a setting less than 30 seconds wouldn't be honored anyway.

But again, I haven't really investigated this. It's just a feeling.

lfbayer avatar Nov 16 '23 16:11 lfbayer

I haven't really investigated this yet, but I'm concerned about how the scheduling of the housekeeper thread works. It seems plausible that even with a setting below 30, if the housekeeper doesn't run it won't matter. So if the housekeeper is running in increments of 30 seconds, then a setting less than 30 seconds wouldn't be honored anyway.

But again, I haven't really investigated this. It's just a feeling.

I check what HouseKeeper thread handle., but don't see anything in HouseKeeper related with keepAlive.

gary258796 avatar Nov 18 '23 02:11 gary258796

As I said, it was just a suspicion. The real question is probably if we allow the setting, does it actually cause the keep alive to run at the expected time?

lfbayer avatar Nov 19 '23 10:11 lfbayer

I create a simple SpringBoot project on my local machine, which have configuration in applicaiton.properties like below

spring.datasource.hikari.keepalive-time=20000
spring.datasource.hikari.minimum-idle=1
spring.datasource.hikari.maximum-pool-size=5

and -Dcom.zaxxer.hikari.housekeeping.periodMs=40000 set in system properties.

Screenshot 2023-11-20 at 8 14 42 PM

From the console log above , we can see the keepAlive do run every 18 seconds and houseKeeper do run every 38 seconds, which is correspond to configuration. So i guess, no matter of what keepAliveTime and HouseKeeper are set to, keepAlive task will still in the way we configure.

gary258796 avatar Nov 20 '23 13:11 gary258796

@lfbayer , I see you put "volunteer-needed" label. Does it mean you came to conclusion that 30s minimum for keep-alive doesn't have enough arguments to exist and you just need someone to open PR?

I see a number of other properties with minimum values which also are a subject to removal (imho): 30s for max-lifetime, 2s for leak detection, 250ms for connection timeout, 250ms for validation timeout.

250ms for connection timeout and validation also seems to have lack of argumentation as my p99 response timing is 10ms. Probably i'd prefer having 250ms timeout for queries being run in keep-alive thread, but not in application logic thread. As in case of 10ms timeout I have enough time to use another connection (which may be not broken)

ruzovvo avatar Mar 26 '24 18:03 ruzovvo