os360

Results 3 comments of os360

Fails with or without amb/ub/b/mla and with or without the draft model. Works without flash attention at about half speed. Fails with --no-warmup -fa -fmoe -thp --parallel 2 but works...

Still fails the same with "--parallel 2" with $ ./build/bin/llama-server --version version: 3900 (e94d1a92) built with cc (Gentoo 14.3.0 p8) 14.3.0 for x86_64-pc-linux-gnu

unsloth/GLM-4.6-UD-Q4_K_XL still failing with with --parallel 2 but working with --parallel 1 /build/bin/llama-server --model /home/trunk/Public/AI/models/unsloth/GLM-4.6-GGUF/UD-Q4_K_XL/GLM-4.6-UD-Q4_K_XL-00001-of-00005.gguf --metrics --host 0.0.0.0 --parallel 2 $ ./build/bin/llama-server --version version: 4015 (912c98f6) built with cc (Gentoo...