Added text image multimodal bot - ISSUE 549
Added Gemini Multimodal (Text+Image) Tutorial - Issue #549
Overview
This PR adds a comprehensive tutorial notebook demonstrating how to build a multimodal chatbot using Google's Gemini API. The notebook provides step-by-step guidance on building a solution that can process both text and images simultaneously.
Features Covered
- Natural language text processing
- Image analysis capabilities
- Combined text+image input handling (true multimodal interaction)
- Conversation history management with visual context
- Response formatting with markdown
- Real-time response streaming
- Parameter customization for different use cases
- Basic error handling and rate limiting
Technical Requirements
- Google API key for Gemini
- Python 3.9+
- Libraries: google-generativeai, pillow, IPython, pandas, matplotlib
Why This Matters
This tutorial fills a documentation gap by showing how to seamlessly integrate text and vision capabilities in a single application, properly manage conversation history with visual context, and optimize parameters for different response types.
Testing Done
- Verified all code samples execute properly
- Tested with various image types and prompt combinations
- Ensured compatibility with current API version
Colab Link
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
View / edit / reply to this conversation on ReviewNB
Giom-V commented on 2025-03-24T15:22:43Z ----------------------------------------------------------------
Line #16. 'Input Types': model.input_token_limit if hasattr(model, 'input_token_limit') else 'Unknown',
I think "Input types" should be "Input limit"
View / edit / reply to this conversation on ReviewNB
Giom-V commented on 2025-03-24T15:22:44Z ----------------------------------------------------------------
Line #2. MODEL_NAME = "gemini-1.5-flash"
Can you use a selector like this?
MODEL_NAME="gemini-2.0-flash" # @param ["gemini-2.0-flash-lite","gemini-2.0-flash","gemini-2.0-pro-exp-02-05"] {"allow-input":true, isTemplate: true}
As it would make the notebook easier to maintain in the future.
Maybe also rename it to "DEFAULT_MODEL_NAME" as I saw that you can overwrite it when initializing GeminiMultimodalChatBot
View / edit / reply to this conversation on ReviewNB
Giom-V commented on 2025-03-24T15:22:45Z ----------------------------------------------------------------
Line #17. temperature: float = 0.7,
It might be interesting to explain why you chose those values. I think most are the default ones, but I don't think it's the case for the temperature at least.
View / edit / reply to this conversation on ReviewNB
Giom-V commented on 2025-03-24T15:22:46Z ----------------------------------------------------------------
Line #63. {"role": "user", "parts": ["I need you to follow these instructions: " + system_prompt]},
Any reason why you are not using the system instruction field from the config? Here's an example for chat.
View / edit / reply to this conversation on ReviewNB
Giom-V commented on 2025-03-24T15:22:47Z ----------------------------------------------------------------
Line #3. You are a helpful, friendly assistant. When responding to questions:
For readability, can you just add some tabulation to the system prompt? Like
system_prompt = """ You are a helpful, friendly assistant. When responding to questions: - If you're unsure, be honest about your limitations - Provide detailed and accurate information - For image analysis, describe what you see in detail - Use markdown formatting to make responses easy to read - When discussing code, include well-commented examples """
View / edit / reply to this conversation on ReviewNB
Giom-V commented on 2025-03-24T15:22:47Z ----------------------------------------------------------------
Can you add slightly more text to the API ref, related examples and continue your discovery? At the very list add bullet points.
@Bhavesh2k4 Thanks for the great submission. I am finally back from my sick leave and found the time to review it.
I just added a couple of minor comments to make the notebook easier to understand and to maintain.
Can you also check the lint and format failures and fix them (format is likely because you haven't run the formatting script, lint because a "we" needs to be changed into a "you", and you likely have to update the README.md to add a link to your new notebook).
Thanks again!
One last thing, I think it should be moved to the examples/ folder.
View / edit / reply to this conversation on ReviewNB
Giom-V commented on 2025-03-24T16:01:28Z ----------------------------------------------------------------
I think you need to move that button down to replace the "run in colab" one as I did in #512 (make sure you use the same size).
Hi @Giom-V ,
Thank you so much for reviewing my notebook, especially after returning from sick leave. I appreciate your detailed feedback and the effort you've put into helping me improve the submission.
I apologize for the delayed response. I was in the middle of my university internals, which account for 25% of my grade points, so I couldn't address the comments immediately.
I'll work on making the changes you suggested:
- Fixing the formatting
- Making the changes in the Notebook file &
- Updating the README.md with the new link
I noticed the linting failed primarily due to a URL issue. I'm a bit confused about the notebook loading error. Since I forked the original repo, I'm wondering if I need to modify the GitHub link to point to my forked repository. Even when I tried changing the link, the linting still failed.
This is my first contribution to the Gemini notebook files, so I'm eager to get it right. Could you provide some guidance on resolving the URL/loading issue?
Thanks again for your help and patience!
@Bhavesh2k4 Don't worry about the link (especially since you need to move the notebook in the example folder which will change its URL). Worse case I'll fix it myself.