lightllm
lightllm copied to clipboard
源码复现过程中出现很多问题
LightLLM运行过程
复现kvoff分支
第一步:创建docker
拉取镜像:docker pull ghcr.io/modeltc/lightllm:main
llama-7b模型过大,在服务器的docker中直接clone总是发生网络中断,因此我将该模型下载到本地,通过Xftp传输到服务器中,而后在创建docker时将模型文件夹映射到lightllm源码的models文件夹中。
模型仓库:[huggyllama/llama-7b · Hugging Face](https://huggingface.co/huggyllama/llama-7b)
docker run -itd --ipc=host --net=host --name lxn_lightllm --gpus all -p 8080:8080 -v /hdd/lxn/llama-7b:/lightllm/lightllm/models/llama-7b ghcr.io/modeltc/lightllm:main /bin/bash
第二步:运行
源码安装:
python setup.py install
模型运行:
python -m lightllm.server.api_server --model_dir models/llama-7b --host 0.0.0.0 --port 8080 --tp 1 --max_total_token_num 120000
错误信息——OOM
load model error: CUDA out of memory. Tried to allocate 938.00 MiB (GPU 0; 31.75 GiB total capacity; 30.87 GiB already allocated; 97.94 MiB free; 30.87 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF CUDA out of memory. Tried to allocate 938.00 MiB (GPU 0; 31.75 GiB total capacity; 30.87 GiB already allocated; 97.94 MiB free; 30.87 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF <class 'torch.cuda.OutOfMemoryError'>
Process Process-1:
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 257, in start_router_process
asyncio.run(router.wait_to_model_ready())
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 62, in wait_to_model_ready
await asyncio.gather(*init_model_ret)
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/model_infer/model_rpc.py", line 229, in init_model
ans : rpyc.AsyncResult = self._init_model(rank_id, world_size, weight_dir, max_total_token_num, load_way, mode)
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/model_infer/model_rpc.py", line 97, in exposed_init_model
raise e
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/model_infer/model_rpc.py", line 68, in exposed_init_model
self.model = LlamaTpPartModel(rank_id, world_size, weight_dir, max_total_token_num, load_way, mode)
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/models/llama/model.py", line 35, in __init__
super().__init__(tp_rank, world_size, weight_dir, max_total_token_num, load_way, mode, weight_dict, finetune_config)
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/common/basemodel/basemodel.py", line 40, in __init__
self._init_mem_manager()
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/models/llama/model.py", line 56, in _init_mem_manager
self.mem_manager = self.memory_manager_class(self.max_total_token_num,
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/common/mem_manager.py", line 10, in __init__
self._init_buffers(size, dtype, head_num, head_dim, layer_num)
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/common/mem_manager.py", line 14, in _init_buffers
self.key_buffer = [torch.empty((size, head_num, head_dim), dtype=dtype, device="cuda") for _ in range(layer_num)]
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/common/mem_manager.py", line 14, in <listcomp>
self.key_buffer = [torch.empty((size, head_num, head_dim), dtype=dtype, device="cuda") for _ in range(layer_num)]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 938.00 MiB (GPU 0; 31.75 GiB total capacity; 30.87 GiB already allocated; 97.94 MiB free; 30.87 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 260, in start_router_process
err_str = '\n'.join(traceback.format_exception(e))
TypeError: format_exception() missing 2 required positional arguments: 'value' and 'tb'
后将max_total_token_num
的值从120000改为6000,OOM错误消失,但有发生了下面的错误(每次总是在下面三种错误中随机出现一种)。在Google上搜索了类似错误,但并没有解决。
1
Process Process-1:
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 270, in start_router_process
loop.run_until_complete(router.loop_for_netio_req())
File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 221, in loop_for_netio_req
recv_req = await self.recv_from_httpserver.recv_pyobj()
File "/opt/conda/lib/python3.9/site-packages/zmq/_future.py", line 356, in _chain
loaded = load(buf)
_pickle.UnpicklingError: could not find MARK
2
Process Process-1:
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 270, in start_router_process
loop.run_until_complete(router.loop_for_netio_req())
File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 221, in loop_for_netio_req
recv_req = await self.recv_from_httpserver.recv_pyobj()
File "/opt/conda/lib/python3.9/site-packages/zmq/_future.py", line 356, in _chain
loaded = load(buf)
_pickle.UnpicklingError: invalid load key, 'n'.
3
Process Process-1:
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 270, in start_router_process
loop.run_until_complete(router.loop_for_netio_req())
File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
File "/opt/conda/lib/python3.9/site-packages/lightllm-1.0.0-py3.9.egg/lightllm/server/router/manager.py", line 221, in loop_for_netio_req
recv_req = await self.recv_from_httpserver.recv_pyobj()
File "/opt/conda/lib/python3.9/site-packages/zmq/_future.py", line 356, in _chain
loaded = load(buf)
_pickle.UnpicklingError: invalid load key, '"'
api_server无法运行:
这个分支没有进行server的测试,可以看看跑test有没有问题
那请问目前是只能运行Readme中的Static inference performance部分嘛(kvoff分支)?
是的,因为serving性能受限,我们没有进一步实现
---原始邮件--- 发件人: @.> 发送时间: 2023年11月1日(周三) 晚上8:03 收件人: @.>; 抄送: "Yunqian @.@.>; 主题: Re: [ModelTC/lightllm] 源码复现过程中出现很多问题 (Issue #187)
那请问目前是只能运行Readme中的Static inference performance部分嘛(kvoff分支)?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
我在Huggingface官网上下载了Chinese-LLaMA-2-1.3B模型,而后运行
test/model/test_llama2.py,得到以下报错:
root@gpu0:/lightllm/test/model# python test_llama2.py
python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.
F
======================================================================
FAIL: test_llama2_infer (__main__.TestLlama2Infer)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/lightllm/test/model/test_llama2.py", line 11, in test_llama2_infer
test_model_inference(world_size=1,
File "/lightllm/test/model/model_infer.py", line 16, in test_model_inference
assert not ans_queue.empty()
AssertionError
----------------------------------------------------------------------
Ran 1 test in 9.372s
FAILED (failures=1)
此问题的与这个issue类似,但这个issue里并没有详细的解决方案