LocalAI
LocalAI copied to clipboard
Accelerate docker build
Description This should auto detect available threads on the system and utilize all threads to accelerate build. This has been tested locally using a 12900k using "llama" backend.
This uses
$(nproc)
alternatively this can be used for physical cores which would be optimal but simply "looks" overly complicated.
$(lscpu -p | grep -v "#" | sort -u -t, -k 2,4 | wc -l)
WARNING
This will utilize ALL threads the cpu reports which may trigger a power limiter for some users (but should still be faster)
This PR fixes # Slow build times.
Notes for Reviewers Needs to be tested to verify that it does not break anything.
I am also not sure that the current failures are due to this due to it correctly responding (?)
[127.0.0.1]:53416 200 - POST /v1/chat/completions
{"level":"debug","time":"2024-03-19T07:13:25Z","message":"Response: {\"created\":1710832396,\"object\":\"chat.completion\",\"id\":\"ba9a6016-d30d-4c1f-9558-dd75f62617ed\",\"model\":\"rwkv_test\",\"choices\":[{\"index\":0,\"finish_reason\":\"stop\",\"message\":{\"role\":\"assistant\",\"content\":\" No, I can't. My memory is not that good. Could you tell me the day of the week again?\"}}],\"usage\":{\"prompt_tokens\":0,\"completion_tokens\":0,\"total_tokens\":0}}"}
[FAILED] in [It] - /home/runner/work/LocalAI/LocalAI/core/http/api_test.go:815 @ 03/19/24 07:13:25.193
• [FAILED] [52.557 seconds]
API test API query backends [It] runs rwkv chat completion
/home/runner/work/LocalAI/LocalAI/core/http/api_test.go:807
[FAILED] Expected
<string>: No, I can't. My memory is not that good. Could you tell me the day of the week again?
To satisfy at least one of these matchers: [%!s(*matchers.ContainSubstringMatcher=&{Sure []}) %!s(*matchers.ContainSubstringMatcher=&{five []})]
In [It] at: /home/runner/work/LocalAI/LocalAI/core/http/api_test.go:815 @ 03/19/24 07:13:25.193
Env file for build tested:
BUILD_TYPE=cublas
THREADS=16
CMAKE_ARGS="-DLLAMA_F16C=ON -DLLAMA_AVX512=OFF -DLLAMA_AVX2=ON -DLLAMA_AVX=ON -DLLAMA_FMA=ON -Dcpu=x86_64"
GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]
DEBUG=true
SINGLE_ACTIVE_BACKEND=true
REBUILD=true
BUILD_GRPC_FOR_BACKEND_LLAMA=true
GO_TAGS=tts stablediffusion
PYTHON_GRPC_MAX_WORKERS=3
LLAMACPP_PARALLEL=1
PARALLEL_REQUESTS=true
MODELS_PATH=/build/models
HUGGINGFACE_HUB_CACHE=/build/models/huggingface
WATCHDOG_IDLE=true
WATCHDOG_IDLE_TIMEOUT=1m
IMAGE_PATH=images
SUNO_OFFLOAD_CPU=true
SUNO_USE_SMALL_MODELS=true
Signed commits
- [ x ] Yes, I signed my commits.
Deploy Preview for localai canceled.
| Name | Link |
|---|---|
| Latest commit | 0a27e03167a6e7a0e230b31e51ff9e80e235134e |
| Latest deploy log | https://app.netlify.com/sites/localai/deploys/65f931a09f1d1100084fea73 |
Functionality to allow parallelized builds was added in https://github.com/mudler/LocalAI/pull/1845. You can just set the MAKEFLAGS env var to include whatever you need. That comes with the added benefit of not always using every thread available, or adding in load average limiting flags (or any other make flags you need/want).
Going to close this one as parallelized build support has already been added