PaddleCloud icon indicating copy to clipboard operation
PaddleCloud copied to clipboard

Use AI to replace rule based autoscaling algorithm

Open putcn opened this issue 7 years ago • 2 comments

@helinwang and I were talking with PR folk in Baidu regarding our autoscaling feature, he brought up a good point that can we use AI to replace our current rule-based autoscaling algorithm? first of all, this feature is definitely not for this release and will need a lot of discussions. based on this thought, I'm thinking the following:

  1. is this possible or doable? can we gather enough data for this model? how are we going to measure if the decision made by this model is a "good" decision?
  2. do we need to extend our training-job yaml def, so that it can expose more info we can pick up as feature to feed the model?
  3. when refactoring is finished, can we pick up the protobuf of computation graph as input to the model?
  4. can we estimate the training time needed for a particular training job?
  5. can we estimate and measure the work(computation effort) needed for a particular training job?

I don't have answers to above questions yet, just some immature initial thoughts. Note them down here for further references and "抛砖引玉" 😉

putcn avatar Oct 14 '17 08:10 putcn

I think reinforcement learning is perfect for this kind of tasks (planning).

helinwang avatar Oct 16 '17 17:10 helinwang

Yes, it's possible, but requires a lot of online cluster data.

typhoonzero avatar Oct 17 '17 12:10 typhoonzero