zxy
zxy
> hi, I met the same problem, in my case, my tensor variables are not in the same device, that's the problem, after I fixed the tensor variables to the...
I also encounter similar issues. When I test the generation throughput on LongBench using script "scripts/long_test.sh", with the following parameter details: ``` gpuid=0 model='JackFram/llama-160m' quant_method='kivi' k_bits=2 v_bits=2 group_size=32 residual_length=128 e=0...
> @CUHKSZzxy @lichongod > > Thank you guys for the detailed benchmark. We also notice it in our previous experiments. The longbench is tested under the batch size == 1...
@Amber-Believe 可以参考以下代码,把 generation config 中的 max_new_tokens 参数调大。max_new_tokens 是控制输出长度上限的 ``` import os from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig from lmdeploy.vl import load_image os.environ['CUDA_VISIBLE_DEVICES'] = '7' model_path = "xxx" # Configure the...
Thanks again for your dedicated contributions. As I was testing the functionality, how are we expected to use this feature? Currently, I use the following commands after launching the API...
Failed with `tp1, dp 32, ep 32`, error info: ``` 2025-04-11 04:15:17,172 - lmdeploy - [37mINFO[0m - async_engine.py:259 - input backend=pytorch, backend_config=PytorchEngineConfig(dtype='auto', tp=0, dp=32, dp_rank=0, ep=32, session_len=None, max_batch_size=128, cache_max_entry_count=0.8, prefill_interval=16,...
@Juniper1021 The error trace information is similar to https://github.com/InternLM/lmdeploy/issues/3343, could you try out ``` pip install nvidia-cublas-cu12==12.4.5.8 ```
> [@CUHKSZzxy](https://github.com/CUHKSZzxy) Can Qwen3-235B-A22B-FP8 be used with DP and EP? If so, can you provide an example? 1. You should use TP4 rather than TP8 for Qwen3-235B-A22B-FP8, the weights are...
InterVL3.5 fp8 量化可以尝试一下这个 https://github.com/InternLM/lmdeploy/pull/4018 其余量化方式目前可能存在适配问题,这边没有明确验证过能否成功
Thanks for pointing out and giving your solutions. This should be fixed in the above PR.