vision-core-ai
vision-core-ai copied to clipboard
Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.
Vision Core AI
Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.
Step 1: Install Llama C++ and package dependencies on your machine
Clone the Llama C++ repository from GitHub:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
On macOS:
Build with make:
make
Or, if you prefer cmake:
cmake --build . --config Release
macOS requirements
you need to install these dependencies in your computer: ffmpeg and portaudio
brew install ffmpeg portaudio
Also be sure to provide permissions to the terminal in the Security & Privacy > Privacy options
Step 2: Download the Model!
- Download from Hugging Face - mys/ggml_bakllava-1 this 2 files:
- ggml-model-q4_k.gguf (or any other quantized model) - only one is required!
- mmproj-model-f16.gguf
-
Copy the paths of those 2 files.
-
Run this in the llama.cpp repository (replace YOUR_PATH with the paths to the files you downloaded):
macOS
./server -m YOUR_PATH/ggml-model-q4_k.gguf --mmproj YOUR_PATH/mmproj-model-f16.gguf -ngl 1Windows
server.exe -m REPLACE_WITH_YOUR_PATH\ggml-model-q4_k.gguf --mmproj REPLACE_WITH_YOUR_PATH\mmproj-model-f16.gguf -ngl 1 -
The llama server is now up and running!
⚠️ NOTE: Keep the server running in the background.
-
Let's run the script to use the webcam and microphone
Step 3: Running the Demo
Open a new terminal window and clone the demo app:
git clone https://github.com/herrera-luis/vision-core-ai.git
cd vision-core-ai
Install python dependencies
pip install -r requirements.txt
Run the main script
python main.py
How to interact with the app
When the application is running you need to press the keys i or c to enable the recording and a second time the same key to stop it
iwill use your webcamcwill use chat