flame icon indicating copy to clipboard operation
flame copied to clipboard

Add hotstart_threshold to flame.pool

Open DeemoONeill opened this issue 10 months ago • 4 comments

upon the concurrency exceeding the threshold a new node is added to the pool.

This allows for pre-empting load and is intended for usecases where a cold-start is costly, such as for machine learning models

I'm not 100% on the implementation on this. Do we would want to spawn the new runner, but then do the cond case as usual? That way the waiting job goes into the exisiting runner, and the new runner can spin up.

Also the new runner will most likely become the new min_runner which means potentially not utilising the previous min runner to it's maximum potential. Would this be an issue?

related to https://github.com/phoenixframework/flame/issues/30

DeemoONeill avatar Apr 22 '24 18:04 DeemoONeill

@DeemoONeill this is great! This is something I've wanted to contribute to for a while, just wasn't sure where to start. You've also raised questions that hadn't occurred to me. I was thinking of using behaviours or a protocol and passing in the module or MFA instead. This would allow for custom growth strategies and flexible configuration.

Good callout on min_runner, it feels like the growth strategy and work distribution need to be separate concerns. At the cost of adding an extra dependency, is this something that GenStage could be used for? FLAME becomes a producer, and each runner is a consumer.

Thank you for putting some time into this! I'd love to pair with you on this if you ever want another pair of eyes.

samharnack avatar May 12 '24 14:05 samharnack

@samharnack apologies I missed this.

Do you mean you were thinking of having like a "statup_strategy" behaviour? That might actually be a good approach.

Have a default behavior which behaves as it does now, spinning up AT capacity. Then having a hotstart behavior which spins up at a percentage of max capacity. That way it's opt-in to the downsides and gives the option of user defined behaviors which use some other heuristics for when to spin up a new machine.

I don't have much capacity until after the weekend, but would be happy to go through some ideas with you

DeemoONeill avatar May 24 '24 12:05 DeemoONeill

@DeemoONeill you are in good company, I guess I don't have Github notifications enabled :/

That's exactly what was thinking. The config would turn into something like this:

children = [
  ...,
  {FLAME.Pool, name: MyRunner, min: 1, max: 10, max_concurrency: 100, strategy: {CustomStrategy, [hotstart_threshold: 0.5]}}
]

I think a first step would be extracting the current scaling logic into a default strategy and getting it merged into main.

I'll try to spike an idea this weekend.

samharnack avatar Jun 06 '24 13:06 samharnack

Hi @samharnack @DeemoONeill I've implemented the idea being discussed in #51 if you want to take a look

nickdichev-firework avatar Aug 18 '24 20:08 nickdichev-firework