T4显卡使用vllm和sglang都无法启动成功

Open wmj9346464543 opened this issue 7 months ago • 0 comments

启动命令 flashtts serve --model_path /app/ckpt/Spark-TTS-0.5B --backend sglang --role_dir data/roles --llm_device cuda --tokenizer_device cuda --detokenizer_device cuda --wav2vec_attn_implementation sdpa --llm_attn_implementation sdpa --torch_dtype "bfloat16" --max_length 1024 --llm_gpu_memory_utilization 0.8 --fix_voice --host 0.0.0.0 --port

报错 [2025-05-27 15:51:54] Scheduler hit an exception: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 2269, in run_scheduler_process scheduler.event_loop_overlap() File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 683, in event_loop_overlap self.process_batch_result( File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 1594, in process_batch_result self.process_batch_result_prefill(batch, result, launch_done) File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler_output_processor_mixin.py", line 94, in process_batch_result_prefill self.tree_cache.cache_unfinished_req(req) File "/usr/local/lib/python3.10/dist-packages/sglang/srt/mem_cache/radix_cache.py", line 218, in cache_unfinished_req page_aligned_kv_indices = kv_indices.clone() RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

[2025-05-27 15:51:54] TpModelWorkerClient hit an exception: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 118, in forward_thread_func self.forward_thread_func_() File "/usr/local/lib/python3.10/dist-packages/torch/utils/contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 151, in forward_thread_func self.worker.forward_batch_generation( File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 211, in forward_batch_generation next_token_ids = self.model_runner.sample( File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/model_runner.py", line 1161, in sample next_token_ids = self.sampler( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1750, in call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/sglang/srt/layers/sampler.py", line 79, in forward logits.div(sampling_info.temperatures) RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

[2025-05-27 15:51:54] Received sigquit from a child process. It usually means the child failed. terminate called after throwing an instance of 'c10::Error' Killed

May 27 '25 15:05 wmj9346464543