gemini-vision-pro
gemini-vision-pro copied to clipboard
Google Gemini Vision Web application with Speech and Text
data:image/s3,"s3://crabby-images/fbe07/fbe07b41d00cdc082bffeb578845690f705a6a40" alt="Gemini Vision Pro Logo"
🚀 Description: 🚀 This is the amazing Google Gemini Vision Pro 📸, a powerful tool that scans images, generates descriptions using Gemini AI Pro Vision API, and provides speech feedback 🗣️. It also captures images using the webcam 🖥️.
🌟 Introduction 🌟
Google Gemini Vision Pro is a versatile application that combines image processing 🖼️, speech recognition 🎤, and text-to-speech capabilities 📢. With this application, you can capture images using your webcam 📷, convert spoken words to text 📝, generate image descriptions 📚, and even have the descriptions spoken back to you 📣.
Installation Guide
Step 1: Clone the repository
git clone https://github.com/haseeb-heaven/Gemini-Vision-Pro
cd Gemini-Vision-Pro
Step 2: Install the dependencies
pip install -r requirements.txt
Step 3: Run the application
streamlit run script.py
Step 4: Obtain the Google Palm API key and Setup the application
- Obtain the Google Palm API key.
- Visit the following URL: Google AI Studio
- Click on the Create API Key button.
- The generated key is your API key. Please make sure to copy it and paste it in the application settings.
- The API key is crucial for the functioning, Please ensure to keep it safe and do not share it with anyone.
Gemini AI settings:
data:image/s3,"s3://crabby-images/581e5/581e58c2ab37ae23d2b7e6ab73bdc792ac20247a" alt="Gemini Settings"
AI Sections
The core AI sections of this project include:
- 📷 Webcam detection using WebRTC, OpenCV, and PIL
- 🗣️ Speech-to-text conversion using Google Cloud Speech-to-Text API
- 🎙️ Text-to-speech conversion using Google Cloud Text-to-Speech API
- 📸 Image processing using Gemini AI Pro Vision API
Features
- 📷 Webcam detection with real-time image capture
- 🗣️ Speech-to-text conversion for spoken words
- 🎙️ Text-to-speech for generating spoken descriptions
- 📸 Image processing using AI to provide detailed descriptions
- 📝 Logging using Python's logging module
- ⚙️ Error handling with Python's exception handling
WebUI - Application Showcase
YouTube demo:
Webcam with live feed:
data:image/s3,"s3://crabby-images/235fb/235fbf698e3742103e803c31b07f21cb6f4fc281" alt="Webcam with live feed"
Gemini Ai Vision demo with object as Cap:
data:image/s3,"s3://crabby-images/0dce1/0dce1f029f8f851214d94d9dd31f58ca86436255" alt="Gemini Ai Vision Cap"
Gemini Ai Vision demo with Hand:
data:image/s3,"s3://crabby-images/9df93/9df933d73fb574bf887090e2b96c4c2a149256d1" alt="Gemini Ai Vision Hand"
Gemini Ai Vision demo with Gesture:
data:image/s3,"s3://crabby-images/428b9/428b9472aeaae88d4db770973331f42649dfdf75" alt="Gemini Ai Vision Gesture"
Packages Used
This project relies on various Python packages, including:
- Streamlit - A web app framework used to build the application
- Streamlit Webrtc - Used for capturing images from the webcam
- OpenCV - Utilized for webcam image capture
- PIL (Pillow) - Used for image processing and conversion
- gTTS (Google Text-to-Speech) - Converts text to speech
- SpeechRecognition - Converts speech to text
- google.cloud.speech - Part of Google Cloud services for speech-to-text conversion
📚 Links and References
Follow these links for Google Gemini Vision Pro related content:
Versioning
- Version: 1.0 : Initial Release
Contributing
We welcome contributions! Please follow our Contribution Guidelines to get started.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
- HeavenHM
- Date: 17-12-2023