llamafile
llamafile copied to clipboard
Bug: -ngl doesn't work when running as a systemd service
Contact Details
What happened?
I expected llamafile to offload compute to the GPU when running as a systemd service file, but that didn't happen.
Here's the systemd service file:
[Unit]
Description=Run Llamafile in server mode
After=network.target
[Service]
Type=simple
ExecStart=/home/ubuntu/run_llamafile.sh
Restart=always
[Install]
WantedBy=default.target
Here's the script called by the systemd service file. Running this script with bash from a command prompt offloads to the GPU as expected, but it doesn't work when called by the systemd service file.
#!/usr/bin/env bash
/home/ubuntu/llama3.1-8b-instruct.llamafile --server --nobrowser -ngl 999 --host 0.0.0.0 -c 0
The llamafile falls back to using the CPU when run as a systemd service.
Version info:
$ uname -a
Linux 6.2.0-1011-aws #11~22.04.1-Ubuntu SMP Mon Aug 21 16:27:59 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ nvidia-smi
Thu Aug 8 18:56:10 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
Version
llamafile v0.8.11
What operating system are you seeing the problem on?
Linux
Relevant log output
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: import_cuda_impl: initializing gpu module...
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: get_rocm_bin_path: note: hipcc not found on $PATH
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: get_rocm_bin_path: note: $HIP_PATH/bin/hipcc does not exist
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: get_rocm_bin_path: note: /opt/rocm/bin/hipcc does not exist
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: link_cuda_dso: note: dynamically linking ./.llamafile/v/0.8.11/ggml-rocm.so
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: link_cuda_dso: warning: libamdhip64.so.6: cannot open shared object file: No such file or directory: failed to load library
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: link_cuda_dso: note: dynamically linking ./.llamafile/v/0.8.11/ggml-cuda.so
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: link_cuda_dso: warning: ./.llamafile/v/0.8.11/ggml-cuda.so: cannot open shared object file: No such file or directory: failed to l>
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: {"function":"server_params_parse","level":"WARN","line":2437,"msg":"Not compiled with GPU offload support, --n-gpu-layers option w>
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: {"build":1500,"commit":"a30b324","function":"server_cli","level":"INFO","line":2869,"msg":"build info","tid":"10437056","timestamp>
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: {"function":"server_cli","level":"INFO","line":2872,"msg":"system info","n_threads":24,"n_threads_batch":-1,"system_info":"AVX = 1>
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from Meta-Llama-3.1-8B-Instruct.Q6_K.gguf (version GG>
Aug 08 18:42:12 ip-10-128-15-31 run_llamafile.sh[50781]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.