babyagi
babyagi copied to clipboard
Please add GPU support for this script. So, When ever I use this script with custom LLM's or Llama. It take too much time to generate because it didn't utilize GPU.
This is perhaps not as fast as performing all model computation on GPU, but you can get a substantial boost to generation rate if you compile llama.cpp with CUBLAS enabled and use it with the existing script.
What are the steps to add this?
I struggled to do this myself, and steps will vary from system to system, so your mileage may vary. That being said, you can try something like this:
1. Obtain the source code. Within your favorite repository directory:
git clone https://github.com/abetlen/llama-cpp-python.git
cd llama-cpp-python/vendor
git clone https://github.com/ggerganov/llama.cpp.git
2. Set LLAMA_CUBLAS
to ON
in llama-cpp-python/CMakeLists.txt.
3. Run setup.py install
or pip . install --upgrade in
llama-cpp-python`
add those files to repo and...
Dockerfile.llamacpp
FROM nvidia/cuda:11.8.0-devel-ubuntu22.04
WORKDIR /tmp
RUN --mount=type=cache,target=/var/cache/apt apt-get update && apt-get install -y \
python3 \
python-is-python3 \
python3-pip \
python3-dev \
python3-venv \
build-essential \
wget \
unzip \
git \
ffmpeg \
&& rm -rf /var/lib/apt/lists/*
RUN mkdir -p /etc/OpenCL/vendors && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd
WORKDIR /app
ENV CUDA_DOCKER_ARCH=all
ENV LLAMA_CUBLAS=1
ENV NVIDIA_VISIBLE_DEVICES=all
# Install depencencies
RUN --mount=type=cache,target=/root/.cache/pip python3 -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi uvicorn sse-starlette pydantic-settings chromadb
# Install llama-cpp-python (build with cuda)
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
COPY requirements.txt /tmp/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -U chromadb
ENV LLAMA_MODEL_PATH="/app/llama-model/phind-codellama-34b-v2.Q4_K_M.gguf"
ENV LLM_MODEL="llama"
ENTRYPOINT ["./babyagi.py"]
#WORKDIR /app/babycoder
#ENTRYPOINT ["python", "./babycoder.py"]
docker-compose.override.yml
services:
babyagi-llama:
build:
context: ./
dockerfile: Dockerfile.llamacpp
container_name: babyagi
volumes:
- "./:/app"
stdin_open: true
tty: true
ulimits:
memlock: -1
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0', '1']
capabilities: [gpu]
run with docker compose up --build babyagi-llama
ah, also need to add ngpulayers in babyagi,py (may be just make it from env as others ? )
print('Initialize model for evaluation')
llm = Llama(
model_path=LLAMA_MODEL_PATH,
n_ctx=CTX_MAX,
n_threads=LLAMA_THREADS_NUM,
n_batch=512,
use_mlock=False,
n_gpu_layers=43
)
print('\nInitialize model for embedding')
llm_embed = Llama(
model_path=LLAMA_MODEL_PATH,
n_ctx=CTX_MAX,
n_threads=LLAMA_THREADS_NUM,
n_batch=512,
embedding=True,
use_mlock=False,
n_gpu_layers=43
)