sglang icon indicating copy to clipboard operation
sglang copied to clipboard

[Bug] DeepSeek-V3 raise EOFError

Open m404notfound opened this issue 11 months ago • 2 comments

Checklist

  • [ ] 1. I have searched related issues but cannot get the expected help.
  • [ ] 2. The bug has not been fixed in the latest version.
  • [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • [ ] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • [ ] 5. Please use English, otherwise it will be closed.

Describe the bug

sglang v0.4.2 post3 & v0.4.2 post2 & v0.4.1 post2 & v0.4.1 post3 H20/tp8/

cmd: python3 -m sglang.launch_server --model-path /model/DeepSeek-V3-Base --host 0.0.0.0 --port 6178 --tp-size 8 --mem-fraction-static 0.95 --trust-remote-code --context-length 16384

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, charset_normalizer.md, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, uvloop.loop, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, psutil._psutil_linux, psutil._psutil_posix, setproctitle, zmq.backend.cython._zmq, yaml._yaml, markupsafe._speedups, PIL._imaging, PIL._imagingft, msgspec._core, sentencepiece._sentencepiece, regex._regex, msgpack._cmsgpack, google._upb._message, ray._raylet, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, pyarrow._parquet, pyarrow._fs, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, xxhash._xxhash, pyarrow._json, pyarrow._acero, pyarrow._csv, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet, cuda_utils, __triton_launcher (total: 114) Fatal Python error: Floating point exception

Thread 0x00007f9cecc03700 (most recent call first): File "/usr/lib/python3.10/threading.py", line 324 in wait File "/usr/lib/python3.10/threading.py", line 607 in wait File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007f9cbcfcd700 (most recent call first): File "/usr/lib/python3.10/threading.py", line 324 in wait File "/usr/lib/python3.10/threading.py", line 607 in wait File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007f9ec0f9c700 (most recent call first): File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_worker/subproc_pool.py", line 47 in _recv_msg File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_worker/subproc_pool.py", line 153 in _read_thread File "/usr/lib/python3.10/threading.py", line 953 in run File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007fa24e72a740 (most recent call first): File "/sgl-workspace/test/sglang/python/sglang/srt/layers/logits_processor.py", line 237 in _get_logits File "/sgl-workspace/test/sglang/python/sglang/srt/layers/logits_processor.py", line 170 in forward File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747 in _call_impl File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736 in _wrapped_call_impl File "/sgl-workspace/test/sglang/python/sglang/srt/models/deepseek_v2.py", line 859 in forward File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116 in decorate_context File "/sgl-workspace/test/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 368 in run_once File "/sgl-workspace/test/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 375 in capture_one_batch_size File "/sgl-workspace/test/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 298 in capture File "/sgl-workspace/test/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 232 in init File "/sgl-workspace/test/sglang/python/sglang/srt/model_executor/model_runner.py", line 730 in init_cuda_graphs File "/sgl-workspace/test/sglang/python/sglang/srt/model_executor/model_runner.py", line 215 in init File "/sgl-workspace/test/sglang/python/sglang/srt/managers/tp_worker.py", line 68 in init File "/sgl-workspace/test/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 63 in init File "/sgl-workspace/test/sglang/python/sglang/srt/managers/scheduler.py", line 240 in init File "/sgl-workspace/test/sglang/python/sglang/srt/managers/scheduler.py", line 1787 in run_scheduler_process File "/usr/lib/python3.10/multiprocessing/process.py", line 108 in run File "/usr/lib/python3.10/multiprocessing/process.py", line 314 in _bootstrap File "/usr/lib/python3.10/multiprocessing/spawn.py", line 129 in _main File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116 in spawn_main File "", line 1 in

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, charset_normalizer.md, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, uvloop.loop, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, psutil._psutil_linux, psutil._psutil_posix, setproctitle, zmq.backend.cython._zmq, yaml._yaml, markupsafe._speedups, PIL._imaging, PIL._imagingft, msgspec._core, sentencepiece._sentencepiece, regex._regex, msgpack._cmsgpack, google._upb._message, ray._raylet, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, pyarrow._parquet, pyarrow._fs, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, xxhash._xxhash, pyarrow._json, pyarrow._acero, pyarrow._csv, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet, cuda_utils, __triton_launcher (total: 114) [2025-02-08 02:07:24] Rank 0 scheduler is dead. Please check if there are relevant logs. [2025-02-08 02:07:26] Exit code: -8 Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/sgl-workspace/test/sglang/python/sglang/launch_server.py", line 14, in launch_server(server_args) File "/sgl-workspace/test/sglang/python/sglang/srt/entrypoints/http_server.py", line 491, in launch_server tokenizer_manager, scheduler_info = _launch_subprocesses(server_args=server_args) File "/sgl-workspace/test/sglang/python/sglang/srt/entrypoints/engine.py", line 434, in _launch_subprocesses data = scheduler_pipe_readers[i].recv() File "/usr/lib/python3.10/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/usr/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/usr/lib/python3.10/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError

Reproduction

Current thread 0x00007fa24e72a740 (most recent call first): File "/sgl-workspace/test/sglang/python/sglang/srt/layers/logits_processor.py", line 237 in _get_logits File "/sgl-workspace/test/sglang/python/sglang/srt/layers/logits_processor.py", line 170 in forward File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747 in _call_impl File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736 in _wrapped_call_impl File "/sgl-workspace/test/sglang/python/sglang/srt/models/deepseek_v2.py", line 859 in forward File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116 in decorate_context File "/sgl-workspace/test/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 368 in run_once File "/sgl-workspace/test/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 375 in capture_one_batch_size File "/sgl-workspace/test/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 298 in capture File "/sgl-workspace/test/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 232 in init File "/sgl-workspace/test/sglang/python/sglang/srt/model_executor/model_runner.py", line 730 in init_cuda_graphs File "/sgl-workspace/test/sglang/python/sglang/srt/model_executor/model_runner.py", line 215 in init File "/sgl-workspace/test/sglang/python/sglang/srt/managers/tp_worker.py", line 68 in init File "/sgl-workspace/test/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 63 in init File "/sgl-workspace/test/sglang/python/sglang/srt/managers/scheduler.py", line 240 in init File "/sgl-workspace/test/sglang/python/sglang/srt/managers/scheduler.py", line 1787 in run_scheduler_process File "/usr/lib/python3.10/multiprocessing/process.py", line 108 in run File "/usr/lib/python3.10/multiprocessing/process.py", line 314 in _bootstrap File "/usr/lib/python3.10/multiprocessing/spawn.py", line 129 in _main File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116 in spawn_main File "", line 1 in

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, charset_normalizer.md, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, uvloop.loop, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, psutil._psutil_linux, psutil._psutil_posix, setproctitle, zmq.backend.cython._zmq, yaml._yaml, markupsafe._speedups, PIL._imaging, PIL._imagingft, msgspec._core, sentencepiece._sentencepiece, regex._regex, msgpack._cmsgpack, google._upb._message, ray._raylet, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, pyarrow._parquet, pyarrow._fs, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, xxhash._xxhash, pyarrow._json, pyarrow._acero, pyarrow._csv, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet, cuda_utils, __triton_launcher (total: 114) [2025-02-08 02:07:24] Rank 0 scheduler is dead. Please check if there are relevant logs. [2025-02-08 02:07:26] Exit code: -8 Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/sgl-workspace/test/sglang/python/sglang/launch_server.py", line 14, in launch_server(server_args) File "/sgl-workspace/test/sglang/python/sglang/srt/entrypoints/http_server.py", line 491, in launch_server tokenizer_manager, scheduler_info = _launch_subprocesses(server_args=server_args) File "/sgl-workspace/test/sglang/python/sglang/srt/entrypoints/engine.py", line 434, in _launch_subprocesses data = scheduler_pipe_readers[i].recv() File "/usr/lib/python3.10/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/usr/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/usr/lib/python3.10/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError

Environment

sglang v0.4.2 post3 & v0.4.2 post2 & v0.4.1 post2 & v0.4.1 post3 H20/tp8/

cmd: python3 -m sglang.launch_server --model-path /model/DeepSeek-V3-Base --host 0.0.0.0 --port 6178 --tp-size 8 --mem-fraction-static 0.95 --trust-remote-code --context-length 16384

m404notfound avatar Feb 08 '25 10:02 m404notfound