LocalAI
LocalAI copied to clipboard
LLaMA-7B-q4 inference only 4 threads are used
LocalAI version:
Jun 8, 2023 Environment, CPU architecture, OS, and Version:
Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz Describe the bug
ggml-model-q4_0.bin
docker-compose up -d --pull always
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
"model": "your-model.bin",
"prompt": "A long time ago in a galaxy far, far away",
"temperature": 0.7
}'
only 4 cpus are used, while I have 40 on a single socket.
@imajiayu did you set the threads here? https://github.com/go-skynet/LocalAI/blob/6bb562272dada1da893f8fb1bfc768b6d819d2de/.env#L3
Thanks a lot.