lightllm
lightllm copied to clipboard
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
lightllm commit id:718e6d6dfffc75e7bbfd7ea80ba4afb77aa27726 huggingfaced下载的chatglm-6b模型,启动服务的时候报错。 模型下载链接:https://huggingface.co/THUDM/chatglm-6b 服务启动命令:python -m lightllm.server.api_server --model_dir THUDM/chatglm-6b --host 0.0.0.0 --port 8100 --tp 1 --max_total_token_num 120000 --tokenizer_mode auto --trust_remote_code 报错信息: ################ load model error: 'ffn_hidden_size' 'ffn_hidden_size' File "/lightllm/lightllm/models/chatglm2/layer_weights/transformer_layer_weight.py",...
I have noticed that LightLLM currently seems to only support decoding through **sampling**. Additional decoding methods such as **BeamSearch** and **GreedySearch** are not yet supported. I would like to know...
If there's a paper or other proof, that would be even better.
**Before you submit an issue, please search for existing issues to avoid duplicates.** **Issue description:** AttributeError: 'LlamaSplitFuseInferStateInfo' object has no attribute 'logn_values' Please provide a clear and concise description of...
```bash Traceback (most recent call last): File "/data/miniconda3/envs/lightllm/lib/python3.10/site-packages/rpyc/core/protocol.py", line 359, in _dispatch_request res = self._HANDLERS[handler](self, *args) File "/data/miniconda3/envs/lightllm/lib/python3.10/site-packages/rpyc/core/protocol.py", line 837, in _handle_call return obj(*args, **dict(kwargs)) File "/data/code/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 116, in...
In [ApiServerArgs.md](https://github.com/ModelTC/lightllm/blob/main/docs/ApiServerArgs.md), an algorithm was introduced to calculate the optimal `max_total_token_num` argument. This process can be automated, and this PR introduces this feature. The `max_total_token_num` argument now defaults to None....
**Before you submit an issue, please search for existing issues to avoid duplicates.** **Issue description:** An assertation error is thrown when a world size is not perfectly divisible by the...
`AttributeError: 'LlamaTransformerLayerWeight' object has no attribute 'q_weight_'` The 4bit version seems to be only "att_norm_weight_" and "ffn_norm_weight". When will the 4bit version be supported?
高并发输出不稳定
我写了prompt,要求输出按照某种固定格式,prompt提供了推理。在chatglm2低并发的时候比较稳定,随着并发越高,格式就很飘。然后测试了并发从2到40,loss的差异很大。然后想着调整frequency_penalty和 temperature测试,发现调整都会影响,不太好估计是具体哪个参数怎么影响。想要了解下高并发的情况下,参数如何设置建议,可以保证输出比较稳定。