Chat with Phi 3.5/3 Vision LLMs

Overview

Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision.

This model enables multi-frame image understanding, image comparison, multi-image summarization/storytelling, and video summarization, which have broad applications in office scenarios.

Getting Started

Follow these steps to set up and run the project:

1. Install Dependencies

i. Download and Install NVIDIA CUDA
Visit the NVIDIA CUDA Toolkit Downloads page and follow the instructions to install CUDA compatible with your system.

ii. Install Required Python Packages
Ensure you have all the necessary dependencies installed by running the following commands:

pip install -r requirements.txt  
pip install flash_attn

If you encounter any issues while installing flash_attn, refer to the FlashAttention Installation Guide for troubleshooting tips and additional setup details.

2. Start the API Server

Launch the API server powered by LitServe:

python server.py

3. Launch the Streamlit App

Start the Streamlit application with the following command:

streamlit run app.py

About

This project is developed and maintained with ❤️ by Bhimraj Yadav.

chat-with-phi-3-vision
chat-with-phi-3-vision copied to clipboard

Metadata

Chat with Phi 3.5/3 Vision LLMs

Overview

Getting Started

1. Install Dependencies

2. Start the API Server

3. Launch the Streamlit App

About

← Metadata

Owner

Metadata

chat-with-phi-3-vision chat-with-phi-3-vision copied to clipboard

Metadata

Chat with Phi 3.5/3 Vision LLMs

Overview

Getting Started

1. Install Dependencies

2. Start the API Server

3. Launch the Streamlit App

About

← Metadata

Owner

Metadata

chat-with-phi-3-vision
chat-with-phi-3-vision copied to clipboard