Bearsaerker
Bearsaerker
This is the log with your env variables and -v ## SPLIT #0: CPU # 0 inputs node # 0 ( GET_ROWS): inp_embd ( 15K) [ CPU ]: token_embd.weight (...
Without the flash attention flag it does not load at all unfortunately. The command: ./bin/llama-server -m '/home/luis/Downloads/gemma-3-12b-it-Q4_K_M.gguf' --n-gpu-layers -1 --batch_size 1024 --cache-type-k q8_0 --cache-type-v q8_0 -c 8000 --port 7777 -t...
changes nothing, same output
Always include which command you use. What command did you use? Did you set --ocr_all_pages?