aerospike-client-go
aerospike-client-go copied to clipboard
Exit fast on conection pool full
Thanks for the PR. What is the use case in which this is necessary?
@khaf Thank you for replying fast. So our server need to reply to request in the fixed and small amount of time. Replying smth like "Cannot calculate" is possible thought.
So without this fix, when the connection pool is full, aerospike-client-go waits (Timeout) in the worst case and replies only then.
In our case (and I think in many other cases) it is much better to fail fast with error in this case.
However I added flag to preserve old behavior WaitForConnectionsToBeRealeased when needed
This is a really nice feature, there should be an option to fail fast if there are no connections available.
Any timeline when this feature will be merged?
The client has an internal retry mechanism, and it does not regard a lack of available pool connections as a failure, since another connection may become available a microsecond later.
I don't think I'll merge this PR, but we may change the client in a way that supports this use case by counting lack of connections in the pool as a failure. In that case a Policy.MaxRetries = 1 would be the same as a fail fast option.
Feedback is welcome.
@khaf You are right, currently fail fast can be achieved by using Policy.MaxRetries = 0, zero because the first request doesn't counts as retry and I don't want to try again. Mind explaining why you chose Policy.MaxRetries = 1?
If I am not wrong than policy.TotalTimeout can also be used for the same.
@saifat29 You are correct, that was a slip. This does not work at the moment since the iterations begin at -1, and this logic.
I will need to change that a bit to allow for this mechanism.
If you will change the code so that MaxRetries = 1 will work - this is great =) However what we have is
- MaxRetries = 2 (to try second time)
- Timeout = 5s (very big value. should trigger only when there is a network problem. Lowering it down does not really work for us because when connection is timing out - it is closed. And creating new connection take time and resources) So Timeout = 50ms will not work - because many connections will be closing/reopening
I guess if MaxRetries=1 is the easiest fix - lets do it. We can retry on the client side - not a big deal.
@gburanov Are you aware of the two distinct Timeout settings in the policies? There is SocketTimeout and then the TotalTimeout which allows you to fine tune these. Have you tried them?
yes, There is a socket timeout description https://github.com/aerospike/aerospike-client-go/blob/d33d4fcf291570041ca447b1e76c0f252fcaa90c/policy.go says If SocketTimeout is not zero and SocketTimeout is reached before an attempt completes, the Timeout above is checked. If Timeout is not exceeded, the transaction is retried . This is not really what we want I think...
So what we want is
-
Close aerospike connections rare - only when there is a networking issue => TotalTimeout should be big (5s)
-
However we should fail fast in the case I described (no connections)
-
we should exit in maximum xxx ms time (to cut off 1% slow loads/writes to aerospike). Now we do it by wrapping writes
doneChan := make(chan struct{})
go func() {
rec, err = c.client.Get(policy, key, binNames...)
close(doneChan)
}()
select {
case <-ctx.Done():
c.userLoadAerospikeContextCancelled.Inc(1)
return nil, ctx.Err()
case <-doneChan:
}
return rec, err
If u have some other way of implementing it - It would be great to know =)
I think the best method in my opinion is to set Policy.MaxRetries = 0 or whatever number of times the client is willing to wait if there are no connections in the pool.
This way client will set Policy.MaxRetries = 0 if it wants to return immediately. Or Policy.MaxRetries = n if the client is willing to try "n" number of times.
I tried setting Policy.MaxRetries to zero but it doesn't works. Check this condition
@gburanov is correct about not using TotalTimeout as a fast exit mechanism. I tried it and noticed that TotalTimeout closes the connection which is not good as resources are getting wasted.
So to sum it up, if Policy.MaxRetries = 0 returns instantly with an appropriate error this issue will be solved.
So what was decided, how should we proceed?
We need to discuss this internally since it may affect all the clients. The next meeting is early next week. I'll get back to you as soon as we have a decision.
@khaf how is it going? =)
Sorry for the late reply @gburanov , we are still at it, and discussed it again in a meeting today.
I'll get back to you soon.
@khaf any update on this?
Yes, this is coming with the new v5 release towards the end of this week.
Yes, this is coming with the new v5 release towards the end of this week.
@khaf is there a commit you can reference for that change? I could not find it on the v5 branch.
@gburanov is correct about not using
TotalTimeoutas a fast exit mechanism. I tried it and noticed thatTotalTimeoutcloses the connection which is not good as resources are getting wasted.
@saifat29 I think I am being bitten by this problem for some time now. This is more or less what I observe happening when looking at a tcpdump traffic dump:
- client has a long-standing TCP connection open with server (let's say 10+ minutes) which has been used for many queries/commands
- most queries have a
TotalTimeoutof1 secondon the policy - when such queries time out, the connections are closed (!)
- this aggravates the timeout problem because now new connections will have to be opened
NOTE: I am using the master branch
Shall I open a new bug about (4)? If it is confirmed that an expiring TotalTimeout leads to closed connections then I think we should fix it, because an user-set timeout does not imply that the underlying connection is bad. TCP has its own flags to detect that. I guess I could use @gburanov's approach until this is solved.
Done in Nov 2021 per this commit: https://github.com/aerospike/aerospike-client-go/commit/943c03c74b61380fe3e2d416c99881401b436fff
Unfortunately, I forgot to mention it here.