puddle
puddle copied to clipboard
The function Acquire does not return immediately,when ctx Done()
in file pool.go
select {
case <-ctx.Done():
// Allow goroutine waiting for signal to exit. Re-signal since we couldn't
// do anything with it. Another goroutine might be waiting.
go func() {
<-waitChan
p.cond.Signal()
p.cond.L.Unlock()
}()
p.cond.L.Lock()
p.canceledAcquireCount += 1
p.cond.L.Unlock()
return nil, ctx.Err()
case <-waitChan:
}
when ctx Done, it will try to Lock before return. Consider such a situation, when the maxSize is exceeded, the Acquire function is called again. If no resources have been released and ctx has timed out, the above logic is triggered. If other resources have not been released, it will wait until a new resource is released, triggering the unlocking. I personally believe that in practical applications, most of the data statistics are completed by the monitoring system. Maybe the Stat here is not very useful, but it has increased the complexity of locking and logic
Consider such a situation, when the maxSize is exceeded, the Acquire function is called again. If no resources have been released and ctx has timed out, the above logic is triggered. If other resources have not been released, it will wait until a new resource is released, triggering the unlocking.
Do you have an example or test demonstrating this erroneous behavior? The logic appears correct to me.
Yes. I wrote a demo to simulate this scene。
package main
import (
"context"
"fmt"
"time"
"github.com/jackc/puddle"
)
func main() {
p := puddle.NewPool(
func(ctx context.Context) (interface{}, error) {
var value int
return &value, nil
},
func(value interface{}) {},
1,
)
resBg, _ := p.Acquire(context.Background())
for i := 0; i < 10000; i++ {
go func() {
now := time.Now()
ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
_, err := p.Acquire(ctx)
defer cancel()
cost := time.Now().Sub(now)
if cost > 110*time.Millisecond {
fmt.Printf("err: %v, cost:%v\n", err, cost)
}
}()
}
time.Sleep(2 * time.Second)
resBg.Release()
}
On my machine, it has the following output:
err: context deadline exceeded, cost:122.398036ms
err: context deadline exceeded, cost:121.074585ms
err: context deadline exceeded, cost:120.942892ms
err: context deadline exceeded, cost:120.154298ms
err: context deadline exceeded, cost:120.339236ms
err: context deadline exceeded, cost:117.905087ms
err: context deadline exceeded, cost:120.816183ms
...
I think it’s the same result on your machine
I think it’s the same result on your machine
Actually, nothing printed for me. I had to change the loop to 100000 to get any output.
err: context deadline exceeded, cost:742.378554ms
err: context deadline exceeded, cost:742.298367ms
err: context deadline exceeded, cost:740.731022ms
err: context deadline exceeded, cost:740.374492ms
But I still don't see the problem. It's spinning up 100,000 goroutines contending for a single resource --- plus 100,000 timers --- plus another 100,000 goroutines to handle the p.cond.Wait into a channel. I would expect to see a delay just because of the Go scheduler.
Just for kicks I reran it with the lock and canceledAcquireCount increment commented out. It reduced the "cost" by a couple hundred ms but did not eliminate the behavior you describe. And I really I don't want to lose the canceldAcquireCount stats.
I suppose that path could be optimized. Maybe using atomics. But IMO, at the point this becomes an issue the system is already extremely overloaded. The real problem is something needs to handle backpressure. Ideally this would be done by the caller of Acquire such that it never is called in a overload situation. This is one of the reasons the canceled acquire stat is important. That is a signal that could be used to detect an overload.
Or the pool could detect an overload situation and immediately drop new Acquire requests. But the strategy used to detect an overload and determine which requests to drop can be complicated and vary by application. puddle is designed to be a minimal, super lightweight pool. I'm not eager to add any features that could be handled by the calling application.
In my practice, I use puddle by pgx, I will report the cost of function Query, Exec etc. I found that the reported value exceeds the timeout of ctx. Sometimes it will exceed tens of milliseconds. I can understand your design, that is vary great,but maybe we can have a better solution to implement it. Finally, thank you for replying to me.