alpaca.cpp Painfully slow

Painfully slow

Open jacobhweeks opened this issue 2 years ago • 6 comments

I managed to get alpaca running on a Hyper-V VM on my PowerEdge R710. The VM has 8 cores and 16GB of RAM. Running Ubuntu 22.04. I had to Make chat from source, otherwise I got an Illegal Instruction error. The problem is that it takes like 1 minute per token. How can I improve this?

main: seed = 1680482599 llama_model_load: loading model from 'ggml-alpaca-7b-q4.bin' - please wait ... llama_model_load: ggml ctx size = 6065.34 MB llama_model_load: memory_size = 2048.00 MB, n_mem = 65536 llama_model_load: loading model part 1/1 from 'ggml-alpaca-7b-q4.bin' llama_model_load: .................................... done llama_model_load: model size = 4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | main: interactive mode on. sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

Apr 03 '23 01:04 jacobhweeks

alpaca.cpp alpaca.cpp copied to clipboard

Painfully slow

alpaca.cpp
alpaca.cpp copied to clipboard