flink-kubernetes-operator [FLINK-34906] Only scale when all tasks are running

What is the purpose of the change

Currently, the autoscaler will scale a job when the JobStatus is RUNNING. But the JobStatus will be RUNNING once job starts schedule, so it doesn't mean all tasks are running. Especially, when the resource isn't enough or job recovers from large state.

The autoscaler will throw exception and generate the AutoscalerError event when tasks are not ready. Also, we don't need to scale it when some tasks are not ready.

Brief change log

[FLINK-34906] Only scale when all tasks are running
- Solution: we only scale job that all tasks are running(some of tasks may be finished).

Verifying this change

Manually test is done
Added the JobStatusUtilsTest

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changes to the CustomResourceDescriptors: no
Core observer or reconciler logic that is regularly executed: no

Documentation

Does this pull request introduce a new feature? no

Mar 21 '24 09:03 1996fanrui

This issue only affects the standalone autoscaler as the kubernetes operator has this logic already in place for setting the RUNNING state. Can we somehow deduplicate this logic?

Mar 21 '24 10:03 gyfora

This issue only affects the standalone autoscaler as the kubernetes operator has this logic already in place for setting the RUNNING state. Can we somehow deduplicate this logic?

Thanks @gyfora provide this information, I didn't notice kubernetes operator has this logic before. Let me look into it.

Mar 21 '24 10:03 1996fanrui

Thanks @mxm for the review and discussion!

This issue only affects the standalone autoscaler as the kubernetes operator has this logic already in place for setting the RUNNING state. Can we somehow deduplicate this logic?

Is that really the case? AFAIK we only check for a RUNNING job state.

AbstractFlinkService#getEffectiveStatus adjusts the JobStatus.RUNNING to JobStatus.CREATED, thanks @gyfora for helping find it. I didn't extract it as a common class due to @gyfora mentioned autoscaler may be moved to the separated repo, so it's better to copy related logic to autoscaler standalone module.

This looks related to #699 which took a different approach by ignoring certain exceptions during the stabilization phase and effectively postponing metric collection.

The adjustment logic is introduced before #699 , it means the some of metrics may be not ready even if all tasks are running(I guess some metrics are generated after running). That's what exactly what #699 solved.

Why do we need to adjust the JobStatus?

If some of tasks are not running, autoscaler doesn't need to call metric collection related logic.
If job.autoscaler.stabilization.interval is set to small value by users, it's easy to throw metric not found exception.
As I understand, job.autoscaler.stabilization.interval hopes to filter out unstable metrics when all tasks just start running.
- For example:
  - job starts(JobStatus is changed to CREATED) at 09:00:00
  - JobStatus is changed to CREATED at 09:01:00
  - all tasks are running at 09:04:00
  - and job.autoscaler.stabilization.interval = 1 min.
- We hopes the stabilization period is 09:04:00 to 09:05:00 instead of 09:01:00 to 09:02:00, or we think the job or metric is stable after 09:05:00, right?
- All tasks are running since 09:04:00, so the metric may be not stable from 09:04:00 to 09:05:00.
- Also, this issue might needs FLINK-34907 as well (I'm preparing the PR).

Please correct me if anything is wrong, thanks a lot.

Mar 25 '24 03:03 1996fanrui

Thanks Rui! The changes make sense to me. To Gyulas point, I think we should try to deduplicate the logic such that both Kubernetes autoscaler and standalone use the same code path.

Mar 27 '24 12:03 mxm

Thanks Rui! The changes make sense to me. To Gyulas point, I think we should try to deduplicate the logic such that both Kubernetes autoscaler and standalone use the same code path.

I could move JobStatusUtils from flink-autoscaler-standalone module to flink-autoscaler module, then both of flink-autoscaler-standalone and flink-kubernetes-operator module can use it.

Hey @gyfora , what do you think about it?

Mar 28 '24 02:03 1996fanrui

sounds good @1996fanrui

Mar 28 '24 05:03 gyfora

Hi @mxm , I merge this PR first, feel free to ping me if you have any comment in the future, thanks~

Apr 16 '24 07:04 1996fanrui

flink-kubernetes-operator flink-kubernetes-operator copied to clipboard

[FLINK-34906] Only scale when all tasks are running

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flink-kubernetes-operator
flink-kubernetes-operator copied to clipboard