chat-with-phi-3-vision
chat-with-phi-3-vision copied to clipboard
Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - w...
Overview
Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision.
This model enables multi-frame image understanding, image comparison, multi-image summarization/storytelling, and video summarization, which have broad applications in office scenarios.
Getting Started
Follow these steps to set up and run the project:
1. Install Dependencies
i. Download and Install NVIDIA CUDA
Visit the NVIDIA CUDA Toolkit Downloads page and follow the instructions to install CUDA compatible with your system.
ii. Install Required Python Packages
Ensure you have all the necessary dependencies installed by running the following commands:
pip install -r requirements.txt
pip install flash_attn
If you encounter any issues while installing flash_attn, refer to the FlashAttention Installation Guide for troubleshooting tips and additional setup details.
2. Start the API Server
Launch the API server powered by LitServe:
python server.py
3. Launch the Streamlit App
Start the Streamlit application with the following command:
streamlit run app.py
About
This project is developed and maintained with ❤️ by Bhimraj Yadav.