lightllm issues

是否有计划支持 lora？

3

Same as https://github.com/vllm-project/vllm/issues/182#issuecomment-1627176207

enhancement

[BUG] triton 2.0.0.dev20221202 has a memory leak bug, and fix way.

when you install triton 2.0.0.dev20221202, the find the compiler.py in ****/python3.9/site-packages/triton/. in L998 - L1018. change to ![597f92a54ac57b170b341a9b83b0d52](https://github.com/ModelTC/lightllm/assets/30762946/7cf406ee-8022-49b1-8803-ef409ee3b661)

hiworldwzj

bug

使用多卡加载模型，推理时报错

24

您好，我使用单块A800进行部署推理时正常，但是使用多卡推理会报错： `Task exception was never retrieved future: Traceback (most recent call last): File "/opt/conda/lib/python3.9/site-packages/rpyc/core/stream.py", line 268, in read buf = self.sock.recv(min(self.MAX_IO_CHUNK, count)) ConnectionResetError: [Errno 104] Connection reset by peer...

wx971025

源码复现过程中出现很多问题

4

# LightLLM运行过程复现kvoff分支 ##### 第一步：创建docker 拉取镜像：`docker pull ghcr.io/modeltc/lightllm:main` llama-7b模型过大，在服务器的docker中直接clone总是发生网络中断，因此我将该模型下载到本地，通过Xftp传输到服务器中，而后在创建docker时将模型文件夹映射到lightllm源码的models文件夹中。模型仓库：[[huggyllama/llama-7b · Hugging Face](https://huggingface.co/huggyllama/llama-7b)](https://huggingface.co/huggyllama/llama-7b) ``` docker run -itd --ipc=host --net=host --name lxn_lightllm --gpus all -p 8080:8080 -v /hdd/lxn/llama-7b:/lightllm/lightllm/models/llama-7b ghcr.io/modeltc/lightllm:main /bin/bash ```...

lxnlxnlxnlxnlxn

bug

[BUG]Qwen模型加载后NTK未生效

12

Qwen模型加载后NTK未生效，超过长度后就开始重复生成了，请问能帮忙解决一下吗？

xyfZzz

bug

out of memory

3

代码报错信息：在V100的机器上，显存32G。能正常启动，当跑一条短query时，就报out of memory 错误。 python3 -m lightllm.server.api_server --model_dir /app/baichuan2-13B --trust_remote_code --host 0.0.0.0 --port 8080 --tp 1 --max_total_token_num 6000 Using a slow tokenizer. This might cause a significant slowdown. Consider...

enhaofrank

bug

[BUG]baichuan-13b-error

2

1, python -m lightllm.server.api_server --model_dir baichuan-13b --host 0.0.0.0 --port 8080 --tp 1 --max_total_token_num 4096 --trust_remote_code success and can see log : INFO: Started server process [560] INFO: Waiting for application...

kuangdao

bug

[FEATURE] Load model directly from huggingface

Thanks for the project! We want to run lightllm directly in a cloud container environment where the current way to provide a `model_dir` is harder than providing a huggingface model...

Peilun-Li

[Question] Flash attention only applies to prefilling stage

2

I have a question arising from reading the code. I notice that in `~/lightllm/models/llama2/layer_infer/transformer_layer_infer.py`, the flash attention is only applied to the prefilling stage, i.e. the `context_attention_fwd`, but not to...

KexinFeng

bug

[BUG] Issues ablout openai /v1/chat/completions interface in streaming mode

2

**Issue description:** 感觉目前的实现跟openai标准的输出的不太一样： 1. finish_reason全都是null，即使生成到最后一个字符了也是null，正常应该是"stop"或"length"吧 2. index全是0 3. stop参数目前不支持："The stop parameter is not currently supported" 4. 在启动服务时，已经设置--eos_id 151645的情况下，生成的内容虽然在之后终止了，但还是会返回，正常情况下这个字符不应该返回的吧 **Steps to reproduce:** 请求示例： { "model": "Qwen", "messages": [ { "role": "user",...

sunxichen

bug

enhancement

lightllm
lightllm copied to clipboard

Metadata

是否有计划支持 lora？

[BUG] triton 2.0.0.dev20221202 has a memory leak bug, and fix way.

使用多卡加载模型，推理时报错

源码复现过程中出现很多问题

[BUG]Qwen模型加载后NTK未生效

out of memory

[BUG]baichuan-13b-error

[FEATURE] Load model directly from huggingface

[Question] Flash attention only applies to prefilling stage

[BUG] Issues ablout openai /v1/chat/completions interface in streaming mode

← Metadata

Owner

Metadata

lightllm lightllm copied to clipboard

Metadata

← Metadata

Owner

Metadata

lightllm
lightllm copied to clipboard