llama-gpt Why is it so slow?

Why is it so slow?

Open sducxh opened this issue 2 years ago • 3 comments

My server information CPU：32 cores GPU: 2x Nvidia 3080 10G RAM: 64G Docker version 24.0.2 NVIDIA-SMI 520.61.05
Driver Version: 520.61.05
CUDA Version: 11.8

I deployed it using Docker. --model code-13b and --with-cuda I changed base image with：11.8.0-devel-ubuntu22.04，devices.count=2 After my testing each token takes about 1 minute, So I want to know why it's so slow and what adjustments I can make.

Aug 28 '23 07:08 sducxh

Super slow here too.

Running the 7b model on a 24G ram 8-core Xeon.

I installed it using UmbrelOS on a proxmox debian 12 LXC.

Aug 28 '23 15:08 omiinaya

I an slow here too !!!

Sep 04 '23 08:09 EZTTU

@sducxh - two things:

-In /cuda/run.sh, the key value to adjust to speed things up is n_gpu_layers. 10 (hard coded default value) is too low for beefy graphics cards; settings this to 40 for my 3080Ti made a huge improvement. Try incrementing this in 5's and restart/retest. -I don't have a second GPU to test if this helps split the loads across multiple cards, but In docker-compose-cuda-gguf.yml, try setting 'count' to 2 and study your GPU usages

Sep 23 '23 23:09 arch1v1st

llama-gpt llama-gpt copied to clipboard

Why is it so slow?

llama-gpt
llama-gpt copied to clipboard