T4显卡使用vllm和sglang都无法启动成功
启动命令 flashtts serve --model_path /app/ckpt/Spark-TTS-0.5B --backend sglang --role_dir data/roles --llm_device cuda --tokenizer_device cuda --detokenizer_device cuda --wav2vec_attn_implementation sdpa --llm_attn_implementation sdpa --torch_dtype "bfloat16" --max_length 1024 --llm_gpu_memory_utilization 0.8 --fix_voice --host 0.0.0.0 --port
报错
[2025-05-27 15:51:54] Scheduler hit an exception: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 2269, in run_scheduler_process
scheduler.event_loop_overlap()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 683, in event_loop_overlap
self.process_batch_result(
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler.py", line 1594, in process_batch_result
self.process_batch_result_prefill(batch, result, launch_done)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/scheduler_output_processor_mixin.py", line 94, in process_batch_result_prefill
self.tree_cache.cache_unfinished_req(req)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/mem_cache/radix_cache.py", line 218, in cache_unfinished_req
page_aligned_kv_indices = kv_indices.clone()
RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
[2025-05-27 15:51:54] TpModelWorkerClient hit an exception: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 118, in forward_thread_func
self.forward_thread_func_()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 151, in forward_thread_func
self.worker.forward_batch_generation(
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/tp_worker.py", line 211, in forward_batch_generation
next_token_ids = self.model_runner.sample(
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/model_executor/model_runner.py", line 1161, in sample
next_token_ids = self.sampler(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1750, in call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/layers/sampler.py", line 79, in forward
logits.div(sampling_info.temperatures)
RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
[2025-05-27 15:51:54] Received sigquit from a child process. It usually means the child failed. terminate called after throwing an instance of 'c10::Error' Killed