lightllm icon indicating copy to clipboard operation
lightllm copied to clipboard

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Results 125 lightllm issues
Sort by recently updated
recently updated
newest added

There are many different styles of prompts for different LLMs, such like openai/llama2 style (especially support SYSTEM role prompt), pure text style, ziya, etc. From the api_server.py 's parameter, we...

enhancement

Hi, thanks for your great work. Any plan to support generate() function like vllm or transformer? Without docker, user can also run generation code with python script. Like this: ```...

enhancement

lightllm commit id:718e6d6dfffc75e7bbfd7ea80ba4afb77aa27726 模型链接:https://huggingface.co/Linly-AI/Chinese-LLaMA-2-7B-hf 启动服务命令:python -m lightllm.server.api_server --model_dir Linly-AI/Chinese-LLaMA-2-7B-hf --host 0.0.0.0 --port 8100 --tp 1 --max_total_token_num 120000 --tokenizer_mode auto --trust_remote_code 测试发现首token延时很高,约3s左右,可以使用上面的模型和启动命令复现问题,辛苦看看是什么原因导致的呢?

bug

Hi, I am using baichuan13B to compare performance with vllm, and find in this case lightllm has no performance gain. The same test vllm result is 240s, comparing with lightllm's...

Hi, I try use lightllm with baichuan13B model, but get below error. I cannot find any TrainingArguments in the code, so is there anything else need to be configured?... The...

Process Process-8: Process Process-7: Traceback (most recent call last): File "", line 21, in _rms_norm_fwd_fused KeyError: ('2-.-0-.-0-09caff3db89e80ddf0eb4f72675bc8f9-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-d962222789c30252d492a16cca3bf467-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, 'i32', 'i32', 'fp32'), (16384,), (True, True, True, (True, False), (True,...

Add support for MPT models, which are licensed under Apache 2.0 just light LightLLM. They have 7B and 30B with different context lengths, here is one of them: https://huggingface.co/mosaicml/mpt-7b-8k

enhancement

1.怎么将输出结果中的多余\n去掉? 2.怎么多个问题进行推理? 2.怎么更改batch大小

The comparison data with TGI is based on what TGI version and startup parameters, as well as hardware.

As mentioned in #20 , lightllm performance would downgrade a lot if without tokenizer.json. So for those model without this file, shall it be reasonable to add some auto conversion...