cookbook icon indicating copy to clipboard operation
cookbook copied to clipboard

Added text image multimodal bot - ISSUE 549

Open Bhavesh2k4 opened this issue 10 months ago • 12 comments

Added Gemini Multimodal (Text+Image) Tutorial - Issue #549

Overview

This PR adds a comprehensive tutorial notebook demonstrating how to build a multimodal chatbot using Google's Gemini API. The notebook provides step-by-step guidance on building a solution that can process both text and images simultaneously.

Features Covered

  • Natural language text processing
  • Image analysis capabilities
  • Combined text+image input handling (true multimodal interaction)
  • Conversation history management with visual context
  • Response formatting with markdown
  • Real-time response streaming
  • Parameter customization for different use cases
  • Basic error handling and rate limiting

Technical Requirements

  • Google API key for Gemini
  • Python 3.9+
  • Libraries: google-generativeai, pillow, IPython, pandas, matplotlib

Why This Matters

This tutorial fills a documentation gap by showing how to seamlessly integrate text and vision capabilities in a single application, properly manage conversation history with visual context, and optimize parameters for different response types.

Testing Done

  • Verified all code samples execute properly
  • Tested with various image types and prompt combinations
  • Ensured compatibility with current API version

Colab Link

Run in Google Colab

Bhavesh2k4 avatar Mar 11 '25 07:03 Bhavesh2k4

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

View / edit / reply to this conversation on ReviewNB

Giom-V commented on 2025-03-24T15:22:43Z ----------------------------------------------------------------

Line #16.                    'Input Types': model.input_token_limit if hasattr(model, 'input_token_limit') else 'Unknown',

I think "Input types" should be "Input limit"


View / edit / reply to this conversation on ReviewNB

Giom-V commented on 2025-03-24T15:22:44Z ----------------------------------------------------------------

Line #2.    MODEL_NAME = "gemini-1.5-flash"

Can you use a selector like this?

MODEL_NAME="gemini-2.0-flash" # @param ["gemini-2.0-flash-lite","gemini-2.0-flash","gemini-2.0-pro-exp-02-05"] {"allow-input":true, isTemplate: true}

As it would make the notebook easier to maintain in the future.

Maybe also rename it to "DEFAULT_MODEL_NAME" as I saw that you can overwrite it when initializing GeminiMultimodalChatBot


View / edit / reply to this conversation on ReviewNB

Giom-V commented on 2025-03-24T15:22:45Z ----------------------------------------------------------------

Line #17.            temperature: float = 0.7,

It might be interesting to explain why you chose those values. I think most are the default ones, but I don't think it's the case for the temperature at least.


View / edit / reply to this conversation on ReviewNB

Giom-V commented on 2025-03-24T15:22:46Z ----------------------------------------------------------------

Line #63.                    {"role": "user", "parts": ["I need you to follow these instructions: " + system_prompt]},

Any reason why you are not using the system instruction field from the config? Here's an example for chat.


View / edit / reply to this conversation on ReviewNB

Giom-V commented on 2025-03-24T15:22:47Z ----------------------------------------------------------------

Line #3.    You are a helpful, friendly assistant. When responding to questions:

For readability, can you just add some tabulation to the system prompt? Like

system_prompt = """
  You are a helpful, friendly assistant. When responding to questions:
  - If you're unsure, be honest about your limitations
  - Provide detailed and accurate information
  - For image analysis, describe what you see in detail
  - Use markdown formatting to make responses easy to read
  - When discussing code, include well-commented examples
"""

View / edit / reply to this conversation on ReviewNB

Giom-V commented on 2025-03-24T15:22:47Z ----------------------------------------------------------------

Can you add slightly more text to the API ref, related examples and continue your discovery? At the very list add bullet points.


@Bhavesh2k4 Thanks for the great submission. I am finally back from my sick leave and found the time to review it.

I just added a couple of minor comments to make the notebook easier to understand and to maintain.

Can you also check the lint and format failures and fix them (format is likely because you haven't run the formatting script, lint because a "we" needs to be changed into a "you", and you likely have to update the README.md to add a link to your new notebook).

Thanks again!

Giom-V avatar Mar 24 '25 15:03 Giom-V

One last thing, I think it should be moved to the examples/ folder.

Giom-V avatar Mar 24 '25 15:03 Giom-V

View / edit / reply to this conversation on ReviewNB

Giom-V commented on 2025-03-24T16:01:28Z ----------------------------------------------------------------

I think you need to move that button down to replace the "run in colab" one as I did in #512 (make sure you use the same size).


Hi @Giom-V ,

Thank you so much for reviewing my notebook, especially after returning from sick leave. I appreciate your detailed feedback and the effort you've put into helping me improve the submission.

I apologize for the delayed response. I was in the middle of my university internals, which account for 25% of my grade points, so I couldn't address the comments immediately.

I'll work on making the changes you suggested:

  • Fixing the formatting
  • Making the changes in the Notebook file &
  • Updating the README.md with the new link

I noticed the linting failed primarily due to a URL issue. I'm a bit confused about the notebook loading error. Since I forked the original repo, I'm wondering if I need to modify the GitHub link to point to my forked repository. Even when I tried changing the link, the linting still failed. image image

This is my first contribution to the Gemini notebook files, so I'm eager to get it right. Could you provide some guidance on resolving the URL/loading issue?

Thanks again for your help and patience!

Bhavesh2k4 avatar Mar 27 '25 02:03 Bhavesh2k4

@Bhavesh2k4 Don't worry about the link (especially since you need to move the notebook in the example folder which will change its URL). Worse case I'll fix it myself.

Giom-V avatar Mar 31 '25 09:03 Giom-V