lightllm icon indicating copy to clipboard operation
lightllm copied to clipboard

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Results 125 lightllm issues
Sort by recently updated
recently updated
newest added

**Before you submit an issue, please search for existing issues to avoid duplicates.** **Issue description:** Lightllm中的models里有qwen_wquant,我想了解这代码支持的是什么版本的qwen模型呢,本地下载了Qwen-7b-chat-AWQ和Qwen1.5-7b-chat-AWQ都会出错,错误内容是AttributeError: 'QwenTransformerLayerWeightQuantized' object has no attribute 'q_weight_' Please provide a clear and concise description of...

bug

**Issue description:** For Llava 1.5 13b, if you run it with the `--tokenizer_mode "auto"` flag set, it still prints a message that the slow tokenizer is being used. Llava has...

bug

Int the container by image: created, use below cmd: ``` docker run -d --privileged --runtime nvidia --gpus all -p 9012:8000 \ -v /root/lightllm/:/app/ \ -v /root/models/:/data/ \ --name lightllm-qwen \...

bug

Thanks for your great work! Here are my concerns: Say we get a batch of inputs with lengths L1,L2,... How to simultaneously compute the attention scores of these inputs by...

大佬们,`MiniCPM-V-2`效果挺不错的,后续会有支持计划嘛

**Before you submit an issue, please search for existing issues to avoid duplicates.** **Issue description:** There already is a lightllm package in pypi. This makes things sort of confusing. Please...

bug

https://github.com/ModelTC/lightllm/blob/main/lightllm/common/basemodel/triton_kernel/dequantize_gemm_int4.py The algorithm in the above file implements weight only int4, but its speed is only 50% of cutpass int4. How can this be resolved?