[ZEPPELIN-4031] Fixed Unable to detect that the interpreter process was killed manually
What is this PR for?
Zeppelin-server can't perceive, When a network exception occurs Or a state where a program exception causes the interpreter process to be unavailable, Cause the user to fail to perform the task all the time. You need to restart the interpreter process to get the user's interpreter back to normal, Cause a bad user experience.
By detecting the state of the remote interpreter process in Zeppelin server, When an interpreter process exception is found, By cleaning up the session of this interpreter, Let the interpreter regain its availability, Improves the user experience, It also reduces the operation and maintenance burden of the system.
What type of PR is it?
[Bug Fix]
What is the Jira issue?
- https://issues.apache.org/jira/browse/ZEPPELIN-4031
How should this be tested?
RemoteInterpreterTest::testDetectIntpProcessKilled()
Screenshots (if appropriate)
Before bug fix

After bug fix

Questions:
- Does the licenses files need update?
- Is there breaking changes for older versions?
- Does this needs documentation?
@zjffdu, @felixcheung , @jongyoul , Please help me review the code, Thanks!
This bug, It's not easy to get through the code review to understand the situation. Better way, It is verified by testing.
ok thanks for explaining. im ok with this. IMO might be worthwhile to refactor the code to make it more straightforward perhaps?
No need to click the second time to execute successfully. Restore the interpreter process immediately

CI Pass https://travis-ci.org/liuxunorg/zeppelin/builds/512089147 @zjffdu , @felixcheung , Please help me review code. In the second commit, I adjusted the way to recreate the invalid remote interpreter process. It is now very elegant to fix remote invalid interpreters, It can be used immediately without any manual intervention. See the GIF effect: https://github.com/apache/zeppelin/pull/3342#issuecomment-477046336