OpenHands [Feature]: Make it possible to reduce Gemini safety settings

What problem or use case are you trying to solve?

Gemini has some very high safety settings by default, which cause it to refuse to generate code sometimes.

Describe the UX of the solution you'd like

It would be good to either:

reduce the gemini safety settings by default (easier)
allow the user to specify the gemini safety settings (harder)

Do you have thoughts on the technical implementation?

LiteLLM has very good documentation on how to reduce the safety settings: https://litellm.vercel.app/docs/providers/gemini#specifying-safety-settings

These would be modified in the OpenDevin/opendevin/llm/llm.py file.

Additional context

This is from a discord discussion: https://discord.com/channels/1222935860639563850/1252721001557528660/1252721001557528660
This would be a great issue for a new contributor or OpenDevin to fix :)

Jun 20 '24 15:06 neubig

IF neither the user nor OpenDevin can predict if/when any action would trigger a safety threshold, it makes it quite hard to judge what makes the effort worthwhile.

Personally, I've run Gemini with different sites/providers and not once got my input related to coding blocked. So either there might be some sort of pre-unlock happening on their backend (which is unknown to me), or if someone else managed to trigger a such a threshold, Germini might have been right. 🤣

In that vein, I'd prefer it as an opt-in option for the user, i.e. via the usual config values (toml and/or LITELLM_ env vars etc.), along with appropriate documentation. But for safety concerns not as a default "unlock" as not only adults could use OpenDevin.

Jun 20 '24 15:06 tobitege

I insert this part of code into the llm.py. Is that not enough?


        if 'gemini' in self.model_name:
            safety_settings = [
                {
                    'category': 'HARM_CATEGORY_HARASSMENT',
                    'threshold': 'BLOCK_NONE',
                },
                {
                    'category': 'HARM_CATEGORY_HATE_SPEECH',
                    'threshold': 'BLOCK_NONE',
                },
                {
                    'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT',
                    'threshold': 'BLOCK_NONE',
                },
                {
                    'category': 'HARM_CATEGORY_DANGEROUS_CONTENT',
                    'threshold': 'BLOCK_NONE',
                },
            ]
            extra_kwargs = {
                'safety_settings': safety_settings,
            }

        if 'gemini' in self.model_name:
            self._completion = partial(
                litellm_completion,
                model=self.model_name,
                api_key=self.api_key,
                base_url=self.base_url,
                api_version=self.api_version,
                custom_llm_provider=custom_llm_provider,
                max_tokens=self.max_output_tokens,
                timeout=self.llm_timeout,
                temperature=llm_temperature,
                top_p=llm_top_p,
                **extra_kwargs,
            )
        else:
            self._completion = partial(
                litellm_completion,
                model=self.model_name,
                api_key=self.api_key,
                base_url=self.base_url,
                api_version=self.api_version,
                custom_llm_provider=custom_llm_provider,
                max_tokens=self.max_output_tokens,
                timeout=self.llm_timeout,
                temperature=llm_temperature,
                top_p=llm_top_p,
            )

Jun 20 '24 20:06 lwaekfjlk

I personally cannot reproduce the high resolved rate that is reported by Deepseek-coder-instruct-v2 even with this safety setting. Is there any insight on this?

Jun 20 '24 20:06 lwaekfjlk

@lwaekfjlk , thanks! Your code looks good, but it'd be nice to avoid the two self._completion() calls, duplicated code is a source of errors.

I personally cannot reproduce the high resolved rate that is reported by Deepseek-coder-instruct-v2 even with this safety setting. Is there any insight on this?

Maybe we could open a new "benchmarking deepseek-v2" issue where you can raise this point?

Jun 20 '24 20:06 neubig

yes,sure. For the code, it just simple dirty code for my personal use right now. I will create a PR for the current issue.

Jun 20 '24 20:06 lwaekfjlk

Just for reference, did you also notice lots of Gemini refusals?

Jun 20 '24 20:06 neubig

I do notice gemini suddenly stops and thinks it is finished after a few turns with such safety setting (actually there is no patch generated). I haven't checked the refusals yet.

This is the visualization result of the trajectory I collected from gemini-1.5-pro.

Jun 20 '24 20:06 lwaekfjlk

Theoretically, I will not face any refusal from Gemini after settting such safety settings, right?

Jun 20 '24 20:06 lwaekfjlk

Interesting! Gemini won't explicitly filter out your requests, but the LM itself might still refuse due to alignment.

But yeah, I think that we should probably have a mechanism for removing the safety filters and instead rely on the Gemini model itself's alignment training.

Jun 20 '24 21:06 neubig

I do notice gemini suddenly stops and thinks it is finished after a few turns with such safety setting (actually there is no patch generated). I haven't checked the refusals yet.

Oh, that is interesting: I never saw a refusal when using Gemini (an actual response saying so), but intermittent stopping mid-answer, even just after couple of lines. In some cases I could just message "continue" or I reworded my request. Wasn't aware this might be related to this topic.

Jun 20 '24 22:06 tobitege

I think OpenDevin should include an agent to check and filter out any harmful prompts. As responsible AI progresses, models will generally incorporate their own checks and filters. Therefore, in the future, other models might also refuse to generate code.

Jun 30 '24 07:06 gagangayari

I think OpenDevin should include an agent to check and filter out any harmful prompts. As responsible AI progresses, models will generally incorporate their own checks and filters. Therefore, in the future, other models might also refuse to generate code.

I don't see OpenDevin being in the task to do such thing, same as an email program or text editor won't prevent you from typing up and sending out "harmful" content. If LLM's object to simple coding tasks, it is an error the vendor needs to fix by better training (data) and I doubt any vendor is actively working to make things worse as your "Therefore" implies.

Jun 30 '24 09:06 tobitege

Forgot to mention, the LLM is one level where moderation may take place. But providers of models already can/have their own mechnisms for that and for some models offer an explicit "self-moderated" access to an LLM, as an alternative.

Jun 30 '24 09:06 tobitege

Here is my experience: On the first day that I tried using Gemini 1.5 Flash from OpenDevin, it refused a very benign prompt ('list the folders inside this folder'), saying that the risk of dangerous content was medium. I addressed this (successfully, so far) in a similar way to what lwaekfjlk describes. (That is to say, the language model did not produce a response. Instead, the code raised an exception.)

Jun 30 '24 10:06 edwardbrazier

Yeah, I think the gemini default filtering settings are overboard, and really malicious things will be stopped by the LM itself anyway. It'd be great to have a PR to address this.

Jun 30 '24 11:06 neubig

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Aug 18 '24 01:08 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Sep 18 '24 01:09 github-actions[bot]