zeppelin icon indicating copy to clipboard operation
zeppelin copied to clipboard

[ZEPPELIN-6258] Improve Process resource management in SparkInterpreterLauncher

Open renechoi opened this issue 5 months ago • 1 comments

What is this PR for?

This PR improves process resource management in the detectSparkScalaVersion method of SparkInterpreterLauncher.java to prevent process hangs and resource leaks.

Current Issues Fixed:

  1. Buffer Overflow Risk: Process stdout was not consumed, potentially causing hangs if spark-submit produces output
  2. Indefinite Blocking: No timeout on process.waitFor() could cause indefinite blocking
  3. Missing Process Cleanup: Process wasn't explicitly destroyed after use
  4. No Exit Value Validation: Process exit status wasn't checked

What type of PR is it?

Improvement

Todos

  • [ ] - Code review
  • [ ] - CI build verification

What is the Jira issue?

  • https://issues.apache.org/jira/browse/ZEPPELIN-6258

How should this be tested?

  • Unit test added: testDetectSparkScalaVersionProcessManagement - Verifies the method works correctly with new process management

  • Manual testing:

    • Start Zeppelin with Spark interpreter
    • Verify spark-submit processes are properly terminated
    • No zombie processes should remain after interpreter initialization
  • CI: All existing tests pass

Screenshots (if appropriate)

N/A

Questions:

  • Does the license files need to update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

Implementation Details

  • Added stdout consumption using IOUtils.copy() to NullOutputStream to prevent buffer overflow
  • Implemented 30-second timeout with process.waitFor(30, TimeUnit.SECONDS)
  • Added forceful process termination on timeout
  • Added exit value logging for better diagnostics
  • Ensured process cleanup in finally block with destroyForcibly()
  • Moved file reading into try-finally block for better resource management

Benefits

  1. Prevents Hangs: Stdout consumption prevents buffer-related process hangs
  2. Guaranteed Termination: 30-second timeout ensures process won't block indefinitely
  3. Better Diagnostics: Exit value logging helps identify spark-submit failures
  4. Resource Safety: Explicit process cleanup prevents zombie processes

renechoi avatar Jul 24 '25 15:07 renechoi

Could you please rebase this branch onto master?

ParkGyeongTae avatar Sep 07 '25 23:09 ParkGyeongTae