lpf6
lpf6
@irexyc 我升级torch从2.2.2到2.3.1,也试了`pip3 install nvidia-cublas-cu12==12.3.4.1`,但还是一样的错误。用gdb拉了一下堆栈: ``` (gdb) backtrace #0 0x00007fc857b6aa58 in turbomind::BlockManager::GetBlockCount(unsigned long, double, std::function) ( block_size=block_size@entry=0, ratio=ratio@entry=0.80000001192092896, get_free_size=...) at /lmdeploy/src/turbomind/models/llama/BlockManager.cc:104 #1 0x00007fc857b6bf7b in turbomind::BlockManager::BlockManager(unsigned long, double, int, turbomind::IAllocator*, std::function) (this=0x7fc7d3fa1740,...
``` (gdb) up 1 #1 0x00007fc857b6bf7b in turbomind::BlockManager::BlockManager(unsigned long, double, int, turbomind::IAllocator*, std::function) (this=0x7fc7d3fa1740, block_size=0, block_count=0.80000001192092896, chunk_size=-1, allocator=, get_free_size=...) at /lmdeploy/src/turbomind/models/llama/BlockManager.cc:35 35 in /lmdeploy/src/turbomind/models/llama/BlockManager.cc (gdb) up 1 #2 0x00007fc857b6f9c7 in...
我明白了,[local_kv_head_num_=0](https://github.com/InternLM/lmdeploy/blob/02077a7d03b6bdaf905ba32e8bdd755d41d77401/src/turbomind/models/llama/LlamaBatch.cc#L959)是LlamaV2构造函数中的赋值导致的[local_kv_head_num_(kv_head_num / tensor_para.world_size_),](https://github.com/InternLM/lmdeploy/blob/02077a7d03b6bdaf905ba32e8bdd755d41d77401/src/turbomind/models/llama/LlamaV2.cc#L83) 根据Gdb的显示 ``` #13 turbomind::LlamaV2::LlamaV2 (this=0x7fc7d3a13a30, head_num=32, kv_head_num=2, size_per_head=, inter_size=, num_layer=40, vocab_size=151552, norm_eps=, attn_params=..., start_id=0, end_id=151329, cache_block_seq_len=64, quant_policy=0, use_context_fmha=true, engine_params=..., lora_params=..., shared_state=std::shared_ptr (use count 9, weak count 0)...
我把命令从 ``` lmdeploy serve api_server Qwen/Qwen2-1.5B --server-port=8000 --tp=4 --model-name=default-model --max-batch-size=32 --session-len=32768 --log-level INFO ``` 改到 ``` lmdeploy serve api_server Qwen/Qwen2-1.5B --server-port=8000 --tp=2 --model-name=default-model --max-batch-size=32 --session-len=32768 --log-level INFO ``` 减少`--tp=4`到`--tp=2`就能成功运行了,但是在处理请求时出现新的错误 ```...
> 估计是 2080 Ti 不支持 bf16 @lzhangzz 使用convert或者其他方式可以转换bf16到f16吗?
另外我运行Qwen/Qwen2-7B是成功的 ``` lmdeploy serve api_server Qwen/Qwen2-7B --server-port=8000 --tp=4 --model-name=default-model --max-batch-size=4 --session-len=32768 ``` 日志里也提到: ``` Fetching 14 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00
> > > 估计是 2080 Ti 不支持 bf16 > > > > > > @lzhangzz 使用convert或者其他方式可以转换bf16到f16吗? > > Hi, @lpf6 2080Ti 不支持 bf16,但是 lmdeploy 会强制使用 fp16 来推理的。 对于 qwen2-1.5b,我在 2080...
> @DavidPeleg6 Here is my quick sketch at adding support for Qwen2 embedding models #5611 这是我添加对 Qwen2 嵌入模型的支持的快速草图 #5611 > > This is not sufficient though since that model you...