lightllm
lightllm copied to clipboard
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
There are many different styles of prompts for different LLMs, such like openai/llama2 style (especially support SYSTEM role prompt), pure text style, ziya, etc. From the api_server.py 's parameter, we...
Hi, thanks for your great work. Any plan to support generate() function like vllm or transformer? Without docker, user can also run generation code with python script. Like this: ```...
lightllm commit id:718e6d6dfffc75e7bbfd7ea80ba4afb77aa27726 模型链接:https://huggingface.co/Linly-AI/Chinese-LLaMA-2-7B-hf 启动服务命令:python -m lightllm.server.api_server --model_dir Linly-AI/Chinese-LLaMA-2-7B-hf --host 0.0.0.0 --port 8100 --tp 1 --max_total_token_num 120000 --tokenizer_mode auto --trust_remote_code 测试发现首token延时很高,约3s左右,可以使用上面的模型和启动命令复现问题,辛苦看看是什么原因导致的呢?
Hi, I am using baichuan13B to compare performance with vllm, and find in this case lightllm has no performance gain. The same test vllm result is 240s, comparing with lightllm's...
Hi, I try use lightllm with baichuan13B model, but get below error. I cannot find any TrainingArguments in the code, so is there anything else need to be configured?... The...
Process Process-8: Process Process-7: Traceback (most recent call last): File "", line 21, in _rms_norm_fwd_fused KeyError: ('2-.-0-.-0-09caff3db89e80ddf0eb4f72675bc8f9-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-d962222789c30252d492a16cca3bf467-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, 'i32', 'i32', 'fp32'), (16384,), (True, True, True, (True, False), (True,...
Add support for MPT models, which are licensed under Apache 2.0 just light LightLLM. They have 7B and 30B with different context lengths, here is one of them: https://huggingface.co/mosaicml/mpt-7b-8k
输出结果问题
1.怎么将输出结果中的多余\n去掉? 2.怎么多个问题进行推理? 2.怎么更改batch大小
Which version is the comparison of the pressure measurement data provided in the project based on?
The comparison data with TGI is based on what TGI version and startup parameters, as well as hardware.
As mentioned in #20 , lightllm performance would downgrade a lot if without tokenizer.json. So for those model without this file, shall it be reasonable to add some auto conversion...