incubator-streampark icon indicating copy to clipboard operation
incubator-streampark copied to clipboard

[Bug] flink job failed by k8s mode, but job state is finish on streamx

Open zhuzhihao94 opened this issue 2 years ago • 2 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

What happened

flink job failed by k8s mode, but job state is finish on streamx; I find infer flink job state inaccurate; image

StreamX Version

1.2.4

Java Version

jdk1.8

Flink Version

1.14.4

Scala Version of Flink

2.12.7

Error Exception

pass

Screenshots

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

zhuzhihao94 avatar Jul 19 '22 11:07 zhuzhihao94

@zhuzhihao94 Hi zhihao, Thanks for your feedback. There is a real problem with the state tracking of Streampark Flink on Kubernetes, which determines the approximate state of the corresponding Flink job pod by listening to Kubernetes events, plus periodically polling the Klink rest-api to determine the final current state.

But this has some edge case issues. The Kubernetes event api does not behave absolutely consistently across providers, and in Flink Applicaton on K8s mode, when a Flink job unexpectedly ends, the pod is reclaimed and the Flink rest service is terminated, based on the rest-api.

As you can see, when a Flink job in application mode ends, the current Flink on K8s state tracking mechanism is not 100% certain that the current state of Flink job is FINISH, CANCELED, or FAILED.

Al-assad avatar Sep 20 '22 03:09 Al-assad

@wolfboys We need more discussion to address this issue.

Al-assad avatar Sep 20 '22 03:09 Al-assad