sglang
sglang copied to clipboard
Unable to run qwen successfully
env: 2080Ti * 2 cuda_12.3.r12.3/compiler.33567101_0 python3.9 pip install "sglang[all]"
error:
new fill batch. #seq: 1. #cached_token: 0. #new_token: 8. #remaining_req: 0. #running_req: 0. tree_cache_hit_rate: 0.00%.
python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector
detailed log:
(sglang2) ma@ubuntu-server:~$ python -m sglang.launch_server --model-path Qwen/Qwen1.5-0.5B --host 0.0.0.0 --port 1235 --mem-fraction-static 0.9 --tp 2
config.json: 661B [00:00, 48.4kB/s]
tokenizer_config.json: 1.16kB [00:00, 105kB/s]
vocab.json: 2.78MB [00:00, 6.09MB/s]
merges.txt: 1.67MB [00:00, 7.41MB/s]
tokenizer.json: 7.03MB [00:01, 6.60MB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
server started on [0.0.0.0]:10008
server started on [0.0.0.0]:10009
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
accepted ('127.0.0.1', 38192) with fd 30
welcome ('127.0.0.1', 38192)
accepted ('127.0.0.1', 56310) with fd 26
welcome ('127.0.0.1', 56310)
Rank 0: load weight begin.
Rank 1: load weight begin.
INFO 02-17 04:29:34 weight_utils.py:163] Using model weights format ['.safetensors']
INFO 02-17 04:29:34 weight_utils.py:163] Using model weights format ['.safetensors']
model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.24G/1.24G [01:52<00:00, 11.0MB/s]
Rank 1: load weight end.
Rank 0: load weight end.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Rank 0: max_total_num_token=382390, max_prefill_num_token=63731, context_len=32768, model_mode=[]
Rank 1: max_total_num_token=382390, max_prefill_num_token=63731, context_len=32768, model_mode=[]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO: Started server process [135494]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:1235 (Press CTRL+C to quit)
INFO: 127.0.0.1:56426 - "GET /get_model_info HTTP/1.1" 200 OK
new fill batch. #seq: 1. #cached_token: 0. #new_token: 8. #remaining_req: 0. #running_req: 0. tree_cache_hit_rate: 0.00%.
python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed. python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion
!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.
Process Process-1:
Traceback (most recent call last):
File "/home/ma/ENTER/envs/sglang2/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/ma/ENTER/envs/sglang2/lib/python3.9/multiprocessing/process.py", line 108, in run
self.target(self._args, **self._kwargs)
File "/home/ma/ENTER/envs/sglang2/lib/python3.9/site-packages/sglang/srt/managers/router/manager.py", line 79, in start_router_process
loop.run_until_complete(router.loop_for_forward())
File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
File "/home/ma/ENTER/envs/sglang2/lib/python3.9/site-packages/sglang/srt/managers/router/manager.py", line 38, in loop_for_forward
out_pyobjs = await self.model_client.step(next_step_input)
File "/home/ma/ENTER/envs/sglang2/lib/python3.9/site-packages/sglang/srt/managers/router/model_rpc.py", line 635, in _func
await asyncio.gather([asyncio.to_thread(t.wait) for t in tasks])
File "/home/ma/ENTER/envs/sglang2/lib/python3.9/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
File "/home/ma/ENTER/envs/sglang2/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/ma/ENTER/envs/sglang2/lib/python3.9/site-packages/rpyc/core/async.py", line 51, in wait
self._conn.serve(self._ttl)
File "/home/ma/ENTER/envs/sglang2/lib/python3.9/site-packages/rpyc/core/protocol.py", line 438, in serve
data = self._channel.poll(timeout) and self._channel.recv()
File "/home/ma/ENTER/envs/sglang2/lib/python3.9/site-packages/rpyc/core/channel.py", line 55, in recv
header = self.stream.read(self.FRAME_HEADER.size)
File "/home/ma/ENTER/envs/sglang2/lib/python3.9/site-packages/rpyc/core/stream.py", line 280, in read
raise EOFError("connection closed by peer")
EOFError: connection closed by peer
HTTPConnectionPool(host='0.0.0.0', port=1235): Read timed out. (read timeout=60)