hossam noaman

Results 6 comments of hossam noaman

After waiting for about 5 minutes `hossam@hossam:~$ ./llamafiler --gpu nvidia --server --v2 -m /home/hossam/ai-models/Qwen/Qwen2.5-Coder-14B-Instruct-GGUF/qwen2.5-coder-14b-instruct-q4_0.gguf --verbose import_cuda_impl: initializing gpu module... extract_cuda_dso: note: prebuilt binary /zip/ggml-cuda.so not found compile_nvidia: note: building ggml-cuda...

using -ngl 30 solved the oom killer

After send the first prompt on http://127.0.0.1:8080 and no reply so I found in terminal `2025-04-05T01:12:45.424263 llamafile/server/client.cpp:679 34124 GET /favicon.ico 2025-04-05T01:12:45.424791 llamafile/server/client.cpp:801 34124 served /zip/www/favicon.ico 2025-04-05T01:12:48.929630 llamafile/server/client.cpp:292 34124 get "/v1/chat/completions"...

CPU mode only working without crashes or problems at all

I confirm this bug, the same for me