LocalAI Accelerate docker build

Description This should auto detect available threads on the system and utilize all threads to accelerate build. This has been tested locally using a 12900k using "llama" backend.

This uses

$(nproc)

alternatively this can be used for physical cores which would be optimal but simply "looks" overly complicated.

$(lscpu -p | grep -v "#" | sort -u -t, -k 2,4 | wc -l)

WARNING

This will utilize ALL threads the cpu reports which may trigger a power limiter for some users (but should still be faster)

This PR fixes # Slow build times.

Notes for Reviewers Needs to be tested to verify that it does not break anything.

I am also not sure that the current failures are due to this due to it correctly responding (?)

[127.0.0.1]:53416 200 - POST /v1/chat/completions
{"level":"debug","time":"2024-03-19T07:13:25Z","message":"Response: {\"created\":1710832396,\"object\":\"chat.completion\",\"id\":\"ba9a6016-d30d-4c1f-9558-dd75f62617ed\",\"model\":\"rwkv_test\",\"choices\":[{\"index\":0,\"finish_reason\":\"stop\",\"message\":{\"role\":\"assistant\",\"content\":\" No, I can't. My memory is not that good. Could you tell me the day of the week again?\"}}],\"usage\":{\"prompt_tokens\":0,\"completion_tokens\":0,\"total_tokens\":0}}"}
  [FAILED] in [It] - /home/runner/work/LocalAI/LocalAI/core/http/api_test.go:815 @ 03/19/24 07:13:25.193
• [FAILED] [52.557 seconds]
API test API query backends [It] runs rwkv chat completion
/home/runner/work/LocalAI/LocalAI/core/http/api_test.go:807

  [FAILED] Expected
      <string>:  No, I can't. My memory is not that good. Could you tell me the day of the week again?
  To satisfy at least one of these matchers: [%!s(*matchers.ContainSubstringMatcher=&{Sure []}) %!s(*matchers.ContainSubstringMatcher=&{five []})]
  In [It] at: /home/runner/work/LocalAI/LocalAI/core/http/api_test.go:815 @ 03/19/24 07:13:25.193

Env file for build tested:

BUILD_TYPE=cublas
THREADS=16
CMAKE_ARGS="-DLLAMA_F16C=ON -DLLAMA_AVX512=OFF -DLLAMA_AVX2=ON -DLLAMA_AVX=ON -DLLAMA_FMA=ON -Dcpu=x86_64"
GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]
DEBUG=true
SINGLE_ACTIVE_BACKEND=true
REBUILD=true
BUILD_GRPC_FOR_BACKEND_LLAMA=true
GO_TAGS=tts stablediffusion
PYTHON_GRPC_MAX_WORKERS=3
LLAMACPP_PARALLEL=1
PARALLEL_REQUESTS=true
MODELS_PATH=/build/models
HUGGINGFACE_HUB_CACHE=/build/models/huggingface
WATCHDOG_IDLE=true
WATCHDOG_IDLE_TIMEOUT=1m
IMAGE_PATH=images
SUNO_OFFLOAD_CPU=true
SUNO_USE_SMALL_MODELS=true

Signed commits

[ x ] Yes, I signed my commits.

Mar 19 '24 06:03 TwinFinz

Deploy Preview for localai canceled.

Name	Link
Latest commit	0a27e03167a6e7a0e230b31e51ff9e80e235134e
Latest deploy log	https://app.netlify.com/sites/localai/deploys/65f931a09f1d1100084fea73

Mar 19 '24 06:03 netlify[bot]

Functionality to allow parallelized builds was added in https://github.com/mudler/LocalAI/pull/1845. You can just set the MAKEFLAGS env var to include whatever you need. That comes with the added benefit of not always using every thread available, or adding in load average limiting flags (or any other make flags you need/want).

Mar 26 '24 18:03 cryptk

Going to close this one as parallelized build support has already been added

Apr 29 '24 16:04 cryptk

Accelerate docker build

WARNING

✅ Deploy Preview for localai canceled.

Deploy Preview for localai canceled.