lightllm issues

2

**Before you submit an issue, please search for existing issues to avoid duplicates.** **Issue description:** Lightllm中的models里有qwen_wquant，我想了解这代码支持的是什么版本的qwen模型呢，本地下载了Qwen-7b-chat-AWQ和Qwen1.5-7b-chat-AWQ都会出错，错误内容是AttributeError: 'QwenTransformerLayerWeightQuantized' object has no attribute 'q_weight_' Please provide a clear and concise description of...

Cesilina

bug

[BUG] Slow Tokenizer Message is printing when the Fast Tokenizer may be in use

1

**Issue description:** For Llava 1.5 13b, if you run it with the `--tokenizer_mode "auto"` flag set, it still prints a message that the slow tokenizer is being used. Llava has...

david-vectorflow

bug

Qwen-14B-INT8 face the issue: 'QwenTransformerLayerWeight' object has no attribute 'q_weight_'

2

Int the container by image: created, use below cmd: ``` docker run -d --privileged --runtime nvidia --gpus all -p 9012:8000 \ -v /root/lightllm/:/app/ \ -v /root/models/:/data/ \ --name lightllm-qwen \...

wangr0031

bug

[Question] How does lightllm implement nopad batching?

Thanks for your great work! Here are my concerns: Say we get a batch of inputs with lengths L1,L2,... How to simultaneously compute the attention scores of these inputs by...

Tomorrowdawn

请问是否有计划支持MiniCPM-V-2

2

大佬们，`MiniCPM-V-2`效果挺不错的，后续会有支持计划嘛

xiabo0816

请问现在支持Yi-34B的awq 4bit部署吗？

5

xyfZzz

Unbind rpyc for model_rpc

huochaitiantang

[BUG] There already is a lightllm in pypi

**Before you submit an issue, please search for existing issues to avoid duplicates.** **Issue description:** There already is a lightllm package in pypi. This makes things sort of confusing. Please...

rlippmann

bug

weight only int4 is slower than cutlass int4

1

https://github.com/ModelTC/lightllm/blob/main/lightllm/common/basemodel/triton_kernel/dequantize_gemm_int4.py The algorithm in the above file implements weight only int4, but its speed is only 50% of cutpass int4. How can this be resolved?

zhoutianzi666

lightllm
lightllm copied to clipboard

Metadata

Add cohere model type

[BUG]Ask aboout Qwen models with weight quantization .

[BUG] Slow Tokenizer Message is printing when the Fast Tokenizer may be in use

Qwen-14B-INT8 face the issue: 'QwenTransformerLayerWeight' object has no attribute 'q_weight_'

[Question] How does lightllm implement nopad batching?

请问是否有计划支持MiniCPM-V-2

请问现在支持Yi-34B的awq 4bit部署吗？

Unbind rpyc for model_rpc

[BUG] There already is a lightllm in pypi

weight only int4 is slower than cutlass int4

← Metadata

Owner

Metadata

lightllm lightllm copied to clipboard

Metadata

← Metadata

Owner

Metadata

lightllm
lightllm copied to clipboard