sglang icon indicating copy to clipboard operation
sglang copied to clipboard

Fix maximum recursion depth triggered on exception exit

Open kebe7jun opened this issue 10 months ago • 7 comments

Fix https://github.com/sgl-project/sglang/issues/3518

Motivation

Modifications

Checklist

  • [X] Format your code according to the Code Formatting with Pre-Commit.
  • [ ] Add unit tests as outlined in the Running Unit Tests.
  • [ ] Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
  • [ ] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
  • [ ] For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.

kebe7jun avatar Feb 12 '25 08:02 kebe7jun

Hey I don't think this fix the error:

Your current patch only addresses scenarios where SIGKILL might not be effective by adding an additional SIGQUIT attempt. It does not prevent psutil from entering a deep recursion when invoked in a signal handler to inspect the current process’s children.

To fully resolve the RecursionError it's better to refactor the signal-handling approach—avoiding complex psutil calls inside the signal handler—and/or update how you gather and kill child processes.

zhaochenyang20 avatar Feb 14 '25 00:02 zhaochenyang20

Thanks, I will review it later and modify it.

kebe7jun avatar Feb 17 '25 01:02 kebe7jun

@kebe7jun Have you done it?

zhaochenyang20 avatar Feb 17 '25 17:02 zhaochenyang20

I tried reproducing locally, but without success, can you provide some inspirations? This patch should solve the problem.

kebe7jun avatar Feb 18 '25 02:02 kebe7jun

@kebe7jun I will merge it thanks.

zhaochenyang20 avatar Feb 18 '25 03:02 zhaochenyang20

@zhaochenyang20 hi, can this pr be merged?

kebe7jun avatar Feb 21 '25 05:02 kebe7jun

@kebe7jun If the CI goese well. I will merge it later...

zhaochenyang20 avatar Feb 21 '25 08:02 zhaochenyang20

@zhaochenyang20 ci goes well.

kebe7jun avatar Feb 25 '25 12:02 kebe7jun

@kebe7jun merged! Thanks!

zhaochenyang20 avatar Feb 25 '25 17:02 zhaochenyang20

please which version can run now?

krmao avatar Feb 27 '25 03:02 krmao

the latest commit on main should be fine. We haven't release a new verison to debug it.

zhaochenyang20 avatar Feb 27 '25 09:02 zhaochenyang20