incubator-streampark
incubator-streampark copied to clipboard
[Bug] flink job failed by k8s mode, but job state is finish on streamx
Search before asking
- [X] I had searched in the issues and found no similar issues.
What happened
flink job failed by k8s mode, but job state is finish on streamx;
I find infer flink job state inaccurate;
StreamX Version
1.2.4
Java Version
jdk1.8
Flink Version
1.14.4
Scala Version of Flink
2.12.7
Error Exception
pass
Screenshots
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
@zhuzhihao94 Hi zhihao, Thanks for your feedback. There is a real problem with the state tracking of Streampark Flink on Kubernetes, which determines the approximate state of the corresponding Flink job pod by listening to Kubernetes events, plus periodically polling the Klink rest-api to determine the final current state.
But this has some edge case issues. The Kubernetes event api does not behave absolutely consistently across providers, and in Flink Applicaton on K8s mode, when a Flink job unexpectedly ends, the pod is reclaimed and the Flink rest service is terminated, based on the rest-api.
As you can see, when a Flink job in application mode ends, the current Flink on K8s state tracking mechanism is not 100% certain that the current state of Flink job is FINISH, CANCELED, or FAILED.
@wolfboys We need more discussion to address this issue.