lightllm issues

Quantization support

8

How to use 8bit quantized models? Can I run GGML/GGUF models?

Are there any efficient way to command kill the lightllm process?

3

the Solution i use is : pkill -9 -f lightllm.server.api_server fuser -k /dev/nvidia0 and as you can see ,it will kill other process and my boss will kill me,so please...

yy9996

下面是我在A100-sxm-80G上的测试结果： vllm `python -m vllm.entrypoints.api_server --model /code/llama-65b-hf --swap-space 16 --disable-log-requests --tensor-parallel-size 8` `python benchmarks/benchmark_serving.py --tokenizer /code/llama-65b-hf --dataset /code/ShareGPT_V3_unfiltered_cleaned_split.json` Total time: 312.02 s Throughput: 3.20 requests/s Average latency: 125.45 s Average...

Cydia2018

[BUG] Support for DeepSeek?

1

Any plans to support https://github.com/deepseek-ai/deepseek-coder/ in the near future?

suhjohn

bug

[BUG] stop_words

When i try to add some stop_words for model，i found parameter stop_sequences in “lightllm/server/sampling_params.py”. There seems to be some issues with line 67： “ if stop_str_ids is not None and...

baisechundu

bug

[Feature]请帮忙提供load_from_weight_dict(weight_dict)接口。

10

需求背景： TGI适配lightllm，多卡加载模型的时候，用到几张卡就会有几个进程，并且每个进程都会完整的加载整个模型到内存中来。当模型文件太大，比如65B以上的模型，使用8卡加载的话就会需要8*130G的内存，这显然是不合理的，会导致OOM。解决办法：可在lightllm中帮忙提供load_from_weight_dict(weight_dict) 接口。TGI层传入权重词典，一边加载一边释放内存，才能解决此问题。

bingo787

feat: add load_from_weight_dict interface

issue: https://github.com/ModelTC/lightllm/issues/277

bingo787

Inconsistent Output between LightLLM and Transformers Inference Library

2

When specifying 'max new tokens', LightLLM's output consistently matches this maximum value. However, Transformers sometimes adjust according to the model itself, resulting in outputs shorter than the specified 'max new...

Lvjinhong

bug

是否能支持sqlcoder系列模型

2

实测sqlcoder2（基于starcoder）模型的速度比vllm快，但是输出内容与原版模型相差甚远，是否完全是因为不支持beam search的问题？

2496289471

请问lightllm可以离线推理吗，有没有参考代码

1

**Before you submit an issue, please search for existing issues to avoid duplicates.** **Issue description:** Please provide a clear and concise description of your issue. **Steps to reproduce:** Please list...

monkeyZhy

bug

lightllm
lightllm copied to clipboard

Metadata

Quantization support

Are there any efficient way to command kill the lightllm process?

lightllm和vllm性能对比

[BUG] Support for DeepSeek?

[BUG] stop_words

[Feature]请帮忙提供load_from_weight_dict(weight_dict)接口。

feat: add load_from_weight_dict interface

Inconsistent Output between LightLLM and Transformers Inference Library

是否能支持sqlcoder系列模型

请问lightllm可以离线推理吗，有没有参考代码

← Metadata

Owner

Metadata

lightllm lightllm copied to clipboard

Metadata

← Metadata

Owner

Metadata

lightllm
lightllm copied to clipboard