pooler icon indicating copy to clipboard operation
pooler copied to clipboard

Starting members not actually limited to batch size of init_count

Open dcheckoway opened this issue 10 years ago • 2 comments

This is not a bug per se, just something I wanted to bring to your attention -- on the off chance it's unintentional or has potential to cause anybody any grief.

Pooler grows the member pool by batches of init_count, but that is not strictly enforced. I don't necessarily think it should be enforced, but some interesting behavior might arise in certain cases because it's not.

What I mean by not strictly enforced is: https://github.com/seth/pooler/blob/master/src/pooler.erl#L560

Here's a sample case to illustrate the point...all members are currently in use, but there is room for growth:

init_count = 10 max_count = 100 free_pids = [] in_use_count = 10 starting_members = []

Somebody tries to take a member. That results in add_members_async(10, ...) starting up another batch of 10, and it returns error_no_members to the caller. All good so far, and now:

starting_members = [...10 elements...]

Immediately thereafter (prior to any of the new batch being fully started/added/accepted), another caller tries to take a member. This gets evaluated:

NumToAdd = max(min(InitCount - NonStaleStartingMemberCount, NumCanAdd), 1)

...which equates to max(min(10 - 10, 80), 1), so NumCanAdd = 1. That results in add_members_async(1, ...) starting up another single member, and it returns error_no_members to the caller.

Now we have 11 members being started. This may not be a problem per se, but if (a) it takes long enough for new members to start, and (b) consumers are trying to take a member at a fast enough rate, it can result in a # of starting members that can grow quite large. Worst case scenario (I believe) is that you can have up to max_count - init_count members starting at once.

(a) and (b) is the perfect storm that can potentially overload things.

This may be a non-issue, but I wanted to bring it to your attention. If this ever bites anybody, a fix would be simple...just placing a configurable hard limit on it, i.e. max_starting_count. It could default to something like max(init_count,1) out of the box, and users could raise/lower it as they see fit.

What do you think?

dcheckoway avatar Jan 06 '15 17:01 dcheckoway

Hi there,

I've only had a chance to ready your report quickly, but looks to me like you've found a bug and something that could be a problem. Perhaps it would be worth adding the config you suggest.

@oferrigni thoughts?

seth avatar Jan 07 '15 04:01 seth

@seth @oferrigni FWIW, I've been seeing this "member leakage" behavior in our production environment. We a setup where init_count is 1000, and we can see our pool grow like:

  • 1000
  • 2001
  • 3003
  • 4008
  • 5017

The overflow is the number of consumers trying to get a member while pooler is off starting up a new batch of members.

If you're not opposed to my suggestion above, I'll submit a PR...

dcheckoway avatar Jan 25 '15 13:01 dcheckoway