lightllm
lightllm copied to clipboard
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Where does lightllm_ppl_int8kv_flashdecoding_kernel locate in ?
Just as the title says. https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5
https://github.com/ModelTC/lightllm/blob/main/lightllm/common/basemodel/triton_kernel/dequantize_gemm_int4.py I notice there are several param candidates to tune for this kernel. I wonder how do you find these values? are they suitable for specific hardware or universal? Do...
如题,谢大佬回答
python -m lightllm.server.api_server --model_dir /root/autodl-tmp/Qwen2-7B-Instruct --host 0.0.0.0 --port 8000 --trust_remote_code --model_name Qwen2-7B-Instruct --data_type=bfloat16 --eos_id 151643 --tokenizer_mode fast ,通过以上命令启动服务正常,但是发送openai格式的请求后,报以下输入类型错误。请问如何解决 