r2d2 icon indicating copy to clipboard operation
r2d2 copied to clipboard

Fail Fast for Certain Errors

Open mariotaku opened this issue 1 year ago • 9 comments
trafficstars

Hi! I implemented a SSH connection pool in my project, and noticed when the connection can't be created (no route to host, failed to authenticate etc), get_timeout will actually retry until timeout.

I'd like to have an option to allow fail-fast for such case. In my fork, I added can_retry function in HandleError, and when it returns false, it will fail immediately without retry.

https://github.com/sfackler/r2d2/compare/6ab3302eeca82732a5e6c9385092fdb6e011f173...aaa2a2ba08582ad685ecdaaa74cc4ac47fe46efa

Do you think this feature makes sense? If so, I'll create a pull request.

mariotaku avatar Mar 12 '24 12:03 mariotaku

@sfackler would you mind taking a look into this? i would greatly appreciate this feature...

https://github.com/sfu-db/connector-x/issues/710

guilhermedelyra avatar Nov 21 '24 17:11 guilhermedelyra

@guilhermedelyra I have tested that feature for a while and it worked pretty fine. In the meantime you could use patch in Cargo.toml to use my commit ID for now.

mariotaku avatar Nov 22 '24 03:11 mariotaku

@guilhermedelyra I have tested that feature for a while and it worked pretty fine. In the meantime you could use patch in Cargo.toml to use my commit ID for now.

thanks for the suggestion! although i'm not sure if i'll be able to do this...

i'm using python's Connectorx, that relies on rust's Connectorx implementation, which has a wrapper on top of r2d2 😅

since i'm not building connectorx from scratch (i'm using pip install connectorx), i dont think i'm able to patch the lib hehe

but again, thanks for starting the discussion and opening the PR 😀

guilhermedelyra avatar Nov 23 '24 03:11 guilhermedelyra

Hey @sfackler, bumping this issue as the retry logic currently performs 7 attempts over 30 seconds. When coupled with a wrong password in an enterprise setting, MSSQL accounts get locked as it exceed the number of attempts allowed by DBA. Hence, having the option to fail fast can prevent this and allow for immediate remediation.

pangjunrong avatar Nov 24 '24 10:11 pangjunrong

Do other connection pool libraries have this concept? What kinds of errors other than auth issues would be considered non-retryable?

sfackler avatar Nov 24 '24 13:11 sfackler

@sfackler SQLAlchemy has something similar: https://docs.sqlalchemy.org/en/20/core/pooling.html#supporting-new-database-error-codes-for-disconnect-scenarios

As for the scenarios, I'm using r2d2 for maintaining SSH connections to devices in local network. When connection refused error occurs, it considers the host as unreachable immediately, rather than waiting for the timeout.

mariotaku avatar Nov 24 '24 16:11 mariotaku

That SQLAlchemy case seems to be covering detection disconnections of active sessions, not failing early on pool creation.

sfackler avatar Nov 24 '24 17:11 sfackler

@sfackler any status on this? 😃

guilhermedelyra avatar Nov 28 '24 16:11 guilhermedelyra

I am still waiting to see an example of another connection pool library that does this.

sfackler avatar Dec 18 '24 17:12 sfackler

I think that rather failing fast for certain errors, what would be most useful is to give us an API to stop a connection from retrying, or to manually close a connection/pool.

surister avatar Oct 30 '25 14:10 surister