flink-kubernetes-operator icon indicating copy to clipboard operation
flink-kubernetes-operator copied to clipboard

[FLINK-34906] Only scale when all tasks are running

Open 1996fanrui opened this issue 11 months ago • 6 comments

What is the purpose of the change

Currently, the autoscaler will scale a job when the JobStatus is RUNNING. But the JobStatus will be RUNNING once job starts schedule, so it doesn't mean all tasks are running. Especially, when the resource isn't enough or job recovers from large state.

The autoscaler will throw exception and generate the AutoscalerError event when tasks are not ready. Also, we don't need to scale it when some tasks are not ready.

Brief change log

  • [FLINK-34906] Only scale when all tasks are running
    • Solution: we only scale job that all tasks are running(some of tasks may be finished).

Verifying this change

  • Manually test is done
  • Added the JobStatusUtilsTest

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changes to the CustomResourceDescriptors: no
  • Core observer or reconciler logic that is regularly executed: no

Documentation

  • Does this pull request introduce a new feature? no

1996fanrui avatar Mar 21 '24 09:03 1996fanrui

This issue only affects the standalone autoscaler as the kubernetes operator has this logic already in place for setting the RUNNING state. Can we somehow deduplicate this logic?

gyfora avatar Mar 21 '24 10:03 gyfora

This issue only affects the standalone autoscaler as the kubernetes operator has this logic already in place for setting the RUNNING state. Can we somehow deduplicate this logic?

Thanks @gyfora provide this information, I didn't notice kubernetes operator has this logic before. Let me look into it.

1996fanrui avatar Mar 21 '24 10:03 1996fanrui

Thanks @mxm for the review and discussion!

This issue only affects the standalone autoscaler as the kubernetes operator has this logic already in place for setting the RUNNING state. Can we somehow deduplicate this logic?

Is that really the case? AFAIK we only check for a RUNNING job state.

AbstractFlinkService#getEffectiveStatus adjusts the JobStatus.RUNNING to JobStatus.CREATED, thanks @gyfora for helping find it. I didn't extract it as a common class due to @gyfora mentioned autoscaler may be moved to the separated repo, so it's better to copy related logic to autoscaler standalone module.

This looks related to #699 which took a different approach by ignoring certain exceptions during the stabilization phase and effectively postponing metric collection.

The adjustment logic is introduced before #699 , it means the some of metrics may be not ready even if all tasks are running(I guess some metrics are generated after running). That's what exactly what #699 solved.

Why do we need to adjust the JobStatus?

  • If some of tasks are not running, autoscaler doesn't need to call metric collection related logic.
  • If job.autoscaler.stabilization.interval is set to small value by users, it's easy to throw metric not found exception.
  • As I understand, job.autoscaler.stabilization.interval hopes to filter out unstable metrics when all tasks just start running.
    • For example:
      • job starts(JobStatus is changed to CREATED) at 09:00:00
      • JobStatus is changed to CREATED at 09:01:00
      • all tasks are running at 09:04:00
      • and job.autoscaler.stabilization.interval = 1 min.
    • We hopes the stabilization period is 09:04:00 to 09:05:00 instead of 09:01:00 to 09:02:00, or we think the job or metric is stable after 09:05:00, right?
    • All tasks are running since 09:04:00, so the metric may be not stable from 09:04:00 to 09:05:00.
    • Also, this issue might needs FLINK-34907 as well (I'm preparing the PR).

Please correct me if anything is wrong, thanks a lot.

1996fanrui avatar Mar 25 '24 03:03 1996fanrui

Thanks Rui! The changes make sense to me. To Gyulas point, I think we should try to deduplicate the logic such that both Kubernetes autoscaler and standalone use the same code path.

mxm avatar Mar 27 '24 12:03 mxm

Thanks Rui! The changes make sense to me. To Gyulas point, I think we should try to deduplicate the logic such that both Kubernetes autoscaler and standalone use the same code path.

I could move JobStatusUtils from flink-autoscaler-standalone module to flink-autoscaler module, then both of flink-autoscaler-standalone and flink-kubernetes-operator module can use it.

Hey @gyfora , what do you think about it?

1996fanrui avatar Mar 28 '24 02:03 1996fanrui

sounds good @1996fanrui

gyfora avatar Mar 28 '24 05:03 gyfora

Hi @mxm , I merge this PR first, feel free to ping me if you have any comment in the future, thanks~

1996fanrui avatar Apr 16 '24 07:04 1996fanrui