llamafile icon indicating copy to clipboard operation
llamafile copied to clipboard

Bug: -ngl doesn't work when running as a systemd service

Open takelley1 opened this issue 6 months ago • 1 comments

Contact Details

[email protected]

What happened?

I expected llamafile to offload compute to the GPU when running as a systemd service file, but that didn't happen.

Here's the systemd service file:

[Unit]
Description=Run Llamafile in server mode
After=network.target

[Service]
Type=simple
ExecStart=/home/ubuntu/run_llamafile.sh
Restart=always

[Install]
WantedBy=default.target

Here's the script called by the systemd service file. Running this script with bash from a command prompt offloads to the GPU as expected, but it doesn't work when called by the systemd service file.

#!/usr/bin/env bash

/home/ubuntu/llama3.1-8b-instruct.llamafile --server --nobrowser -ngl 999 --host 0.0.0.0 -c 0

The llamafile falls back to using the CPU when run as a systemd service.

Version info:

$ uname -a
Linux 6.2.0-1011-aws #11~22.04.1-Ubuntu SMP Mon Aug 21 16:27:59 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

$ nvidia-smi
Thu Aug  8 18:56:10 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |

Version

llamafile v0.8.11

What operating system are you seeing the problem on?

Linux

Relevant log output

Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: import_cuda_impl: initializing gpu module...
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: get_rocm_bin_path: note: hipcc not found on $PATH
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: get_rocm_bin_path: note: $HIP_PATH/bin/hipcc does not exist
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: get_rocm_bin_path: note: /opt/rocm/bin/hipcc does not exist
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: link_cuda_dso: note: dynamically linking ./.llamafile/v/0.8.11/ggml-rocm.so
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: link_cuda_dso: warning: libamdhip64.so.6: cannot open shared object file: No such file or directory: failed to load library
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: link_cuda_dso: note: dynamically linking ./.llamafile/v/0.8.11/ggml-cuda.so
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: link_cuda_dso: warning: ./.llamafile/v/0.8.11/ggml-cuda.so: cannot open shared object file: No such file or directory: failed to l>
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: {"function":"server_params_parse","level":"WARN","line":2437,"msg":"Not compiled with GPU offload support, --n-gpu-layers option w>
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: {"build":1500,"commit":"a30b324","function":"server_cli","level":"INFO","line":2869,"msg":"build info","tid":"10437056","timestamp>
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: {"function":"server_cli","level":"INFO","line":2872,"msg":"system info","n_threads":24,"n_threads_batch":-1,"system_info":"AVX = 1>
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from Meta-Llama-3.1-8B-Instruct.Q6_K.gguf (version GG>
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.

takelley1 avatar Aug 08 '24 18:08 takelley1