lightllm issues

[BUG] chatglm-6b模型推理服务报错

5

lightllm commit id：718e6d6dfffc75e7bbfd7ea80ba4afb77aa27726 huggingfaced下载的chatglm-6b模型，启动服务的时候报错。模型下载链接：https://huggingface.co/THUDM/chatglm-6b 服务启动命令：python -m lightllm.server.api_server --model_dir THUDM/chatglm-6b --host 0.0.0.0 --port 8100 --tp 1 --max_total_token_num 120000 --tokenizer_mode auto --trust_remote_code 报错信息： ################ load model error: 'ffn_hidden_size' 'ffn_hidden_size' File "/lightllm/lightllm/models/chatglm2/layer_weights/transformer_layer_weight.py",...

yeliang2258

bug

What is the plan to support beam search

4

I have noticed that LightLLM currently seems to only support decoding through **sampling**. Additional decoding methods such as **BeamSearch** and **GreedySearch** are not yet supported. I would like to know...

feifeibear

Is there any comparison of the effects related to token attention? For example, compare with page attention

2

If there's a paper or other proof, that would be even better.

skykiseki

有类似chatgpt接口调用吗？

6

xxm1668

bug

[BUG] Qwen-7B-Chat AttributeError: 'LlamaSplitFuseInferStateInfo' object has no attribute 'logn_values'

4

**Before you submit an issue, please search for existing issues to avoid duplicates.** **Issue description:** AttributeError: 'LlamaSplitFuseInferStateInfo' object has no attribute 'logn_values' Please provide a clear and concise description of...

exceedzhang

bug

no attribute 'qkv_weight_' AttributeError when load Qwen-14B-Chat-Int4

8

```bash Traceback (most recent call last): File "/data/miniconda3/envs/lightllm/lib/python3.10/site-packages/rpyc/core/protocol.py", line 359, in _dispatch_request res = self._HANDLERS[handler](self, *args) File "/data/miniconda3/envs/lightllm/lib/python3.10/site-packages/rpyc/core/protocol.py", line 837, in _handle_call return obj(*args, **dict(kwargs)) File "/data/code/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 116, in...

jarviszeng-zjc

bug

Support automatically calculate max_total_token_num

7

In [ApiServerArgs.md](https://github.com/ModelTC/lightllm/blob/main/docs/ApiServerArgs.md), an algorithm was introduced to calculate the optimal `max_total_token_num` argument. This process can be automated, and this PR introduces this feature. The `max_total_token_num` argument now defaults to None....

singularity-s0

[BUG] Assertation error self.config["num_attention_heads"] % self.world_size_ == 0 when not perfectly divisible

1

**Before you submit an issue, please search for existing issues to avoid duplicates.** **Issue description:** An assertation error is thrown when a world size is not perfectly divisible by the...

getorca

bug

An error occurred while deploying the 4bit version of Yi-34B-Chat

1

`AttributeError: 'LlamaTransformerLayerWeight' object has no attribute 'q_weight_'` The 4bit version seems to be only "att_norm_weight_" and "ffn_norm_weight". When will the 4bit version be supported?

wx971025

高并发输出不稳定

4

我写了prompt，要求输出按照某种固定格式，prompt提供了推理。在chatglm2低并发的时候比较稳定，随着并发越高，格式就很飘。然后测试了并发从2到40，loss的差异很大。然后想着调整frequency_penalty和 temperature测试，发现调整都会影响，不太好估计是具体哪个参数怎么影响。想要了解下高并发的情况下，参数如何设置建议，可以保证输出比较稳定。

GavinZhao19

bug

enhancement

lightllm
lightllm copied to clipboard

Metadata

[BUG] chatglm-6b模型推理服务报错

What is the plan to support beam search

Is there any comparison of the effects related to token attention? For example, compare with page attention

有类似chatgpt接口调用吗？

[BUG] Qwen-7B-Chat AttributeError: 'LlamaSplitFuseInferStateInfo' object has no attribute 'logn_values'

no attribute 'qkv_weight_' AttributeError when load Qwen-14B-Chat-Int4

Support automatically calculate max_total_token_num

[BUG] Assertation error self.config["num_attention_heads"] % self.world_size_ == 0 when not perfectly divisible

An error occurred while deploying the 4bit version of Yi-34B-Chat

高并发输出不稳定

← Metadata

Owner

Metadata

lightllm lightllm copied to clipboard

Metadata

← Metadata

Owner

Metadata

lightllm
lightllm copied to clipboard