llamafile icon indicating copy to clipboard operation
llamafile copied to clipboard

Unable to run on Windows: infinite loop with some messages

Open dims12 opened this issue 1 year ago • 3 comments

Running file of Windows give me numerous

$ llava-v1.5-7b-q4.exe
note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
...

If I provide the flag it spams with another message

$ llava-v1.5-7b-q4.exe -ngl 9999
import_cuda_impl: initializing gpu module...
get_rocm_bin_path: note: amdclang++.exe not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/amdclang++.exe does not exist
get_rocm_bin_path: note: /opt/rocm/bin/amdclang++.exe does not exist
get_rocm_bin_path: note: clang++.exe not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/clang++.exe does not exist
get_rocm_bin_path: note: /opt/rocm/bin/clang++.exe does not exist
link_cuda_dso: note: dynamically linking C:\Users\Dmitry/.llamafile/ggml-rocm.dll
link_cuda_dso: warning: library not found: failed to load library
get_nvcc_path: note: nvcc.exe not found on $PATH
get_nvcc_path: note: $CUDA_PATH/bin/nvcc.exe does not exist
get_nvcc_path: note: /opt/cuda/bin/nvcc.exe does not exist
get_nvcc_path: note: /usr/local/cuda/bin/nvcc.exe does not exist
link_cuda_dso: note: dynamically linking C:\Users\Dmitry/.llamafile/ggml-cuda.dll
ggml_cuda_link: welcome to CUDA SDK with tinyBLAS
link_cuda_dso: GPU support linked
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1070, compute capability 6.1, VMM: yes
link_cuda_dso: GPU support loaded
{"timestamp":1711288382,"level":"INFO","function":"server_cli","line":2457,"message":"build info","build":1500,"commit":"a30b324"}
{"timestamp":1711288382,"level":"INFO","function":"server_cli","line":2457,"message":"build info","build":1500,"commit":"a30b324"}
{"timestamp":1711288382,"level":"INFO","function":"server_cli","line":2457,"message":"build info","build":1500,"commit":"a30b324"}
{"timestamp":1711288382,"level":"INFO","function":"server_cli","line":2457,"message":"build info","build":1500,"commit":"a30b324"}
...

What can be done?

dims12 avatar Mar 24 '24 13:03 dims12

Are you using ConEmu? https://github.com/Mozilla-Ocho/llamafile/issues/57#issuecomment-1846694678 What llamafile version are you using?

jart avatar Mar 24 '24 18:03 jart

For me, with v0.7.1, turning off ConEmu's ConEmuHk.dll injection solved the issue.

ZoomRmc avatar Apr 17 '24 11:04 ZoomRmc

@jart I am not using the "ConEmu" I have the same issue Exactly, Although I have nvidia 980 gpu, It's damn old, I know, but maybe that's where the problem is... The hardware is too old maby... I tried on Windows and WSL2. The same problem. Well it runs on GPU, but only 2.3 tokens per second for me, it's much faster on CPU. 12-14 tokens per second. I'm assuming it's because it's using some kind of slower cuda dll for gpu... version: llamafile-0.8.1 model: Phi-3-mini-4k-instruct-q4.gguf

It should definitely run faster on a GPU no?

but Then, my hardware is quite old.

snufas avatar May 01 '24 19:05 snufas