sglang icon indicating copy to clipboard operation
sglang copied to clipboard

Unable to run qwen successfully

Open maxin9966 opened this issue 1 year ago • 15 comments

env: 2080Ti * 2 cuda_12.3.r12.3/compiler.33567101_0 python3.9 pip install "sglang[all]"

error: new fill batch. #seq: 1. #cached_token: 0. #new_token: 8. #remaining_req: 0. #running_req: 0. tree_cache_hit_rate: 0.00%. python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector, llvm::SmallVector > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.

detailed log: (sglang2) ma@ubuntu-server:~$ python -m sglang.launch_server --model-path Qwen/Qwen1.5-0.5B --host 0.0.0.0 --port 1235 --mem-fraction-static 0.9 --tp 2 config.json: 661B [00:00, 48.4kB/s]
tokenizer_config.json: 1.16kB [00:00, 105kB/s]
vocab.json: 2.78MB [00:00, 6.09MB/s] merges.txt: 1.67MB [00:00, 7.41MB/s] tokenizer.json: 7.03MB [00:01, 6.60MB/s] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. server started on [0.0.0.0]:10008 server started on [0.0.0.0]:10009 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. accepted ('127.0.0.1', 38192) with fd 30 welcome ('127.0.0.1', 38192) accepted ('127.0.0.1', 56310) with fd 26 welcome ('127.0.0.1', 56310) Rank 0: load weight begin. Rank 1: load weight begin. INFO 02-17 04:29:34 weight_utils.py:163] Using model weights format ['.safetensors'] INFO 02-17 04:29:34 weight_utils.py:163] Using model weights format ['.safetensors'] model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.24G/1.24G [01:52<00:00, 11.0MB/s] Rank 1: load weight end. Rank 0: load weight end. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Rank 0: max_total_num_token=382390, max_prefill_num_token=63731, context_len=32768, model_mode=[] Rank 1: max_total_num_token=382390, max_prefill_num_token=63731, context_len=32768, model_mode=[] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. INFO: Started server process [135494] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:1235 (Press CTRL+C to quit) INFO: 127.0.0.1:56426 - "GET /get_model_info HTTP/1.1" 200 OK new fill batch. #seq: 1. #cached_token: 0. #new_token: 8. #remaining_req: 0. #running_req: 0. tree_cache_hit_rate: 0.00%. python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector, llvm::SmallVector > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion !(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed. python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion !(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed. Process Process-1: Traceback (most recent call last): File "/home/ma/ENTER/envs/sglang2/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/ma/ENTER/envs/sglang2/lib/python3.9/multiprocessing/process.py", line 108, in run self.target(self._args, **self._kwargs) File "/home/ma/ENTER/envs/sglang2/lib/python3.9/site-packages/sglang/srt/managers/router/manager.py", line 79, in start_router_process loop.run_until_complete(router.loop_for_forward()) File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete File "/home/ma/ENTER/envs/sglang2/lib/python3.9/site-packages/sglang/srt/managers/router/manager.py", line 38, in loop_for_forward out_pyobjs = await self.model_client.step(next_step_input) File "/home/ma/ENTER/envs/sglang2/lib/python3.9/site-packages/sglang/srt/managers/router/model_rpc.py", line 635, in _func await asyncio.gather([asyncio.to_thread(t.wait) for t in tasks]) File "/home/ma/ENTER/envs/sglang2/lib/python3.9/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/home/ma/ENTER/envs/sglang2/lib/python3.9/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/home/ma/ENTER/envs/sglang2/lib/python3.9/site-packages/rpyc/core/async.py", line 51, in wait self._conn.serve(self._ttl) File "/home/ma/ENTER/envs/sglang2/lib/python3.9/site-packages/rpyc/core/protocol.py", line 438, in serve data = self._channel.poll(timeout) and self._channel.recv() File "/home/ma/ENTER/envs/sglang2/lib/python3.9/site-packages/rpyc/core/channel.py", line 55, in recv header = self.stream.read(self.FRAME_HEADER.size) File "/home/ma/ENTER/envs/sglang2/lib/python3.9/site-packages/rpyc/core/stream.py", line 280, in read raise EOFError("connection closed by peer") EOFError: connection closed by peer HTTPConnectionPool(host='0.0.0.0', port=1235): Read timed out. (read timeout=60)

maxin9966 avatar Feb 17 '24 04:02 maxin9966