lightllm issues

from lightllm_ppl_int8kv_flashdecoding_kernel import group8_int8kv_flashdecoding_stage1

Where does lightllm_ppl_int8kv_flashdecoding_kernel locate in ?

AlvL1225

bug

[Feature]: Suport for InternVL-Chat-V1-5

1

Just as the title says. https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5

JingofXin

bug

Add Support to Florence-2 !

1

Link: https://huggingface.co/microsoft/Florence-2-large

KaifAhmad1

bug

How do you decide the tune params for triton kernels?

https://github.com/ModelTC/lightllm/blob/main/lightllm/common/basemodel/triton_kernel/dequantize_gemm_int4.py I notice there are several param candidates to tune for this kernel. I wonder how do you find these values? are they suitable for specific hardware or universal? Do...

sleepwalker2017

我看到列表里支持qwen7B，请问是否支持qwen1.5-14B呢？

1

如题，谢大佬回答

koalaaaaaaaaa

运行api_server正常，但是发送正常openai请求报错，说输入类型错误

2

python -m lightllm.server.api_server --model_dir /root/autodl-tmp/Qwen2-7B-Instruct --host 0.0.0.0 --port 8000 --trust_remote_code --model_name Qwen2-7B-Instruct --data_type=bfloat16 --eos_id 151643 --tokenizer_mode fast ，通过以上命令启动服务正常，但是发送openai格式的请求后，报以下输入类型错误。请问如何解决 ![image](https://github.com/user-attachments/assets/f829a1a5-28db-4725-8bef-9ead6f82086b)

xiaoshizijiayou

lightllm
lightllm copied to clipboard

Metadata

from lightllm_ppl_int8kv_flashdecoding_kernel import group8_int8kv_flashdecoding_stage1

有不通过http的其他推理入口吗

[Feature]: Suport for InternVL-Chat-V1-5

Add Support to Florence-2 !

How do you decide the tune params for triton kernels?

我看到列表里支持qwen7B，请问是否支持qwen1.5-14B呢？

feat: add cohere model for v1 and plus

add cudagraph warmup

Visual rpc scm

运行api_server正常，但是发送正常openai请求报错，说输入类型错误

← Metadata

Owner

Metadata

lightllm lightllm copied to clipboard

Metadata

← Metadata

Owner

Metadata

lightllm
lightllm copied to clipboard