OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

Fix issue #7830: Add support for Gemini preview model versions

Open erkinalp opened this issue 9 months ago • 9 comments

  • [ ] This change is worth documenting at https://docs.all-hands.dev/
  • [x] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

End-user friendly description of the problem this fixes or functionality that this introduces.

This PR fixes an issue where users were unable to run agents using the gemini-2.5-pro-preview-03-25 model. After this fix, OpenHands will properly support Gemini preview model versions, allowing users to use the latest Gemini models with their agents. The fix also supports models with complex paths like openrouter/google/gemini-2.5-pro-preview-03-25.


Give a summary of what the PR does, explaining any non-trivial design decisions.

The PR modifies the model name check in the LLM class to handle Gemini preview versions. The issue was that the code only checked for exact matches with gemini-2.5-pro in the FUNCTION_CALLING_SUPPORTED_MODELS list, but didn't handle preview versions like gemini-2.5-pro-preview-03-25.

The solution:

  1. Creates a helper function is_gemini_preview_supported() to check if a model name is a supported Gemini preview version
  2. Checks if the model name starts with "gemini-" and contains "preview"
  3. Extracts the base model name (before "-preview") and checks if it's in the supported models list
  4. Handles complex model paths by splitting the full path and checking each part (e.g., openrouter/google/gemini-2.5-pro-preview-03-25)

This approach ensures that any preview version of a supported Gemini model will be correctly identified as supporting function calling, regardless of how it's referenced in the model path.

A comprehensive test case was added to verify that both regular and preview versions of Gemini models are correctly identified, including models with complex paths.


Link of any specific issues this addresses.

Fixes #7830

erkinalp avatar Apr 12 '25 15:04 erkinalp

So far I've been getting errors when calling with gemini/gemini-2.5-pro Though apparently, the following snippet worked for me

import os
from litellm import completion

os.environ['GEMINI_API_KEY'] = "FILL_IN_YOUR_KEY"
response = completion(
    model="gemini/gemini-2.5-pro-preview-03-25",
    messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
)
print(response.choices[0].model_extra.message.content)

image

After asking Gemini ,there may be a discrepancy between using models via Google AI Studio versus Vertex AI:

LiteLLM supports a variety of Gemini models through its integration with Google AI Studio and Vertex AI. Here's a breakdown of the Gemini LLMs that LiteLLM can work with:

Through Google AI Studio:

  • Gemini Pro: This is a multimodal model capable of handling text and code. You can access it via the model name "gemini/gemini-pro" in LiteLLM.
  • Gemini Pro Vision: This model extends Gemini Pro's capabilities to understand and reason about visual information (images and video) in addition to text and code. Use "gemini/gemini-1.0-pro-vision-001" to access it through LiteLLM.
  • Gemini 1.5 Pro: This is a more advanced multimodal model with a very large context window. You can use "gemini/gemini-1.5-pro" or "gemini/gemini-1.5-pro-latest" to access it. LiteLLM also supports sending response_schema as a parameter for this model.
  • Gemini 1.5 Flash: This is a faster and more cost-effective version of the Gemini 1.5 model, optimized for high-throughput applications. Access it using "gemini/gemini-1.5-flash".

Through Vertex AI:

  • LiteLLM supports various Gemini models available on Google Cloud's Vertex AI platform. The specific model names will typically follow the format "vertex_ai/gemini-...". Examples include:
    • "vertex_ai/gemini-pro"
    • "vertex_ai/gemini-1.5-pro"
    • "vertex_ai/gemini-1.5-flash-preview-0514"
  • When using Vertex AI models, you might need to provide your Vertex AI project ID and optionally the location and credentials through the litellm_params.

Key things to note:

  • To use Gemini models with LiteLLM, you'll generally need to set your Gemini API key as an environment variable (e.g., GEMINI_API_KEY).
  • LiteLLM aims to provide a consistent OpenAI-like interface for interacting with these models, simplifying the process of switching between different LLMs.
  • You can utilize LiteLLM's features like logging, cost tracking, and pass-through endpoints with Gemini models.
  • For the most up-to-date and specific model identifiers, it's always recommended to refer to the official LiteLLM documentation for the Gemini integration.

In summary, LiteLLM offers broad support for the Gemini family of LLMs, allowing you to leverage their diverse capabilities within your applications through a unified API.

avi12 avatar Apr 12 '25 17:04 avi12

What errors have you got when using Gemini, @avi12 ?

There are indeed differences between VertexAI and Google AI, at least as far as model availability is concerned (not all models are in both places all the time, it seems), and indeed VertexAI requires more parameters to authenticate with it. Vertex is fairly complicated IMO. Have you used both?

enyst avatar Apr 12 '25 17:04 enyst

Mine runs with Gemini 2.5 however it can't call tools

H0ARK avatar Apr 12 '25 18:04 H0ARK

Quoting @openhands-agent:


@enyst Thank you for reviewing the PR!

The issue this PR is addressing is specifically related to the error reported in issue #7830, where the user was unable to run an agent using the gemini-2.5-pro-preview-03-25 model.

While you're right that Gemini 2.5 Pro is supported, the specific preview version gemini-2.5-pro-preview-03-25 is not being recognized correctly. The current code only checks for exact matches with gemini-2.5-pro in the FUNCTION_CALLING_SUPPORTED_MODELS list, but doesn't handle preview versions like gemini-2.5-pro-preview-03-25.

From the log file attached to the issue, we can see that the Docker runtime container exits with an error:

14:46:00 - openhands:INFO: docker_runtime.py:174 - [runtime 8f5d83d937254639bd9d7ffc081041f1] Waiting for client to become ready at http://localhost:38075...
14:46:15 - openhands:ERROR: agent_session.py:319 - Runtime initialization failed: Container openhands-runtime-8f5d83d937254639bd9d7ffc081041f1 has exited.

The issue reporter (@avi12) confirmed in a comment that they were getting errors when using OpenHands with gemini-2.5-pro-preview-03-25, but were able to use it directly with LiteLLM:

import os
from litellm import completion

os.environ['GEMINI_API_KEY'] = "FILL_IN_YOUR_KEY"
response = completion(
    model="gemini/gemini-2.5-pro-preview-03-25",
    messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
)
print(response.choices[0].model_extra.message.content)

This suggests that while LiteLLM supports the preview version, OpenHands' model recognition logic doesn't properly handle it.

The fix adds logic to handle Gemini preview versions by:

  1. Creating a helper function is_gemini_preview_supported() to check if a model name is a supported Gemini preview version
  2. Checking if the model name starts with "gemini-" and contains "preview"
  3. Extracting the base model name (before "-preview") and checking if it's in the supported models list
  4. Handling complex model paths by splitting the full path and checking each part (e.g., openrouter/google/gemini-2.5-pro-preview-03-25)

This ensures that any preview version of a supported Gemini model will be correctly identified as supporting function calling, regardless of how it's referenced in the model path.

As @avi12 mentioned, there are differences between using models via Google AI Studio versus Vertex AI, and this PR helps ensure that preview versions of supported models work correctly regardless of the path format used to reference them.

erkinalp avatar Apr 12 '25 18:04 erkinalp

Mine runs with Gemini 2.5 however it can't call tools

@H0ARK Could you please explain, what model exactly did you write in the UI, from what provider, and how do you see no tool use?

I'm looking at the code just now, in the debugger, and I see this check passes just fine:

model_name_supported = (
            self.config.model in FUNCTION_CALLING_SUPPORTED_MODELS
            or self.config.model.split('/')[-1] in FUNCTION_CALLING_SUPPORTED_MODELS
            or any(m in self.config.model for m in FUNCTION_CALLING_SUPPORTED_MODELS)
        )

for self.config.model = 'gemini/gemini-2.5-pro-preview-03-25' => the result is: model_name_supported = True

If you experience no tool use, you can also force it by adding to config.toml:

[llm]
native_tool_calling=true

enyst avatar Apr 12 '25 19:04 enyst

@erkinalp So you were fixing the issue submitted by @avi12 ? But the errors in avi12's log file are about Docker, docker runtime failed to initialize on Windows WSL.

It's not about Gemini, as far as I can see.

I think the LLM you used with openhands-agent may be hallucinating or trying too hard to do what the humans ask? Out of curiosity, what LLM was it? You could perhaps ask it instead to make unit tests testing first that Gemini 2.5 Pro works with tool use. Then, if the tests fail, then let's figure out the cause. I could be wrong, but I don't see why it would fail and I cannot reproduce a problem.

Edited to add: I renamed avi12's bug report to state it was about the runtime's initialization. If there are errors with Gemini, I would love to see a log file? We should fix it in that case, Gemini 2.5 Pro is good and should work.

enyst avatar Apr 12 '25 20:04 enyst

@erkinalp Please see what openhands-agent did when it was asked to make unit tests only:

  • https://github.com/All-Hands-AI/OpenHands/pull/7838

I would love your review! Please let me know if you think it's incorrect or missing cases.

enyst avatar Apr 12 '25 20:04 enyst

@enyst let's merge changes from both PRs

erkinalp avatar Apr 13 '25 09:04 erkinalp

@erkinalp OK, thank you! I don't think the PR fixes 7830, though, the docker issue, but maybe that got fixed anyway according to comments.

If anything doesn't seem to work well with Gemini-2.5-Pro, I would love an error log, or other kind of log or screenshot. In my experience, it's a good model, and it works with function calling.

I'm working on a branch to improve how it works with openhands here, and I'd be happy to have your feedback: 😄

  • https://github.com/All-Hands-AI/OpenHands/pull/7748

enyst avatar Apr 14 '25 15:04 enyst

This PR is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar May 19 '25 02:05 github-actions[bot]