lightllm issues

Supporting GPT-NEO and GPT-NEOX

Hi, Thanks for your great work, are there any plans to support models like GPT-NEO and GPT-NEOX?

goodbai-nlp

enhancement

Whether to support single card multi-instance loading

4

one 3090-24g gpu, load multi instance, like triton

dushulin

my A800 80G*8

7

How can this problem be solved？？ self.value_buffer = [torch.empty((size, head_num, head_dim), dtype=dtype, device="cuda") for _ in range(layer_num)] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.14 GiB (GPU 0; 79.35...

weisihao

调用报错

14

/usr/bin/ld: skipping incompatible /usr/lib32/libcuda.so when searching for -lcuda /usr/bin/ld: cannot find -lcuda: No such file or directory /usr/bin/ld: skipping incompatible /usr/lib32/libcuda.so when searching for -lcuda collect2: error: ld returned 1...

xxm1668

Comparison with deepspeed inference?

1

as title mentioned

allanj

torch, triton版本确认及显存占用分析

8

requirements.txt中是torch 2.0.0；安装的时候和triton 2.1.0 不兼容；安装时triton改为2.0.0安装；安装后单独更新安装triton至2.1.0版本； server可以正常运行，请求时发生错误： > /root/.triton/llvm/llvm+mlir-17.0.0-x86_64-linux-gnu-centos-7-release/include/llvm/Support/Casting.h:566: decltype(auto) llvm::cast(const From&) [with To = mlir::triton::gpu::BLockedEncodingAttr; From = mlir::Attribute]: Asserttion `isa(Val) && "cast() argument of incompatible type!"' faliled 基础环境：redhat 7,...

LittleYouEr

Add support for OpenGVLab/InternVL2-Llama3-76B

flyinglandlord

ADD CN and EN docs

sufubao

请支持minicpmv2.5

3

minicpmv2.5在多模态领域效果非常好，用户也非常多，是否能够进行支持

LDLINGLINGLING

bug

question about fp8 version of context_flashattention_nopad.py

2

[context_flashattention_nopad_fp16_fp8.txt](https://github.com/user-attachments/files/16421521/context_flashattention_nopad_fp16_fp8.txt) we have implemented a f8 version of context_flashattention_nopad.py. the v shape needs to be changed for performance improvement described in https://triton-lang.org/main/getting-started/tutorials/06-fused-attention.html. however, the current result is not correct, could...

changyuanzhangchina

bug

lightllm
lightllm copied to clipboard

Metadata

Supporting GPT-NEO and GPT-NEOX

Whether to support single card multi-instance loading

my A800 80G*8

调用报错

Comparison with deepspeed inference?

torch, triton版本确认及显存占用分析

Add support for OpenGVLab/InternVL2-Llama3-76B

ADD CN and EN docs

请支持minicpmv2.5

question about fp8 version of context_flashattention_nopad.py

← Metadata

Owner

Metadata

lightllm lightllm copied to clipboard

Metadata

← Metadata

Owner

Metadata

lightllm
lightllm copied to clipboard