[Feature]: Make it possible to reduce Gemini safety settings
What problem or use case are you trying to solve?
Gemini has some very high safety settings by default, which cause it to refuse to generate code sometimes.
Describe the UX of the solution you'd like
It would be good to either:
- reduce the gemini safety settings by default (easier)
- allow the user to specify the gemini safety settings (harder)
Do you have thoughts on the technical implementation?
LiteLLM has very good documentation on how to reduce the safety settings: https://litellm.vercel.app/docs/providers/gemini#specifying-safety-settings
These would be modified in the OpenDevin/opendevin/llm/llm.py file.
Additional context
- This is from a discord discussion: https://discord.com/channels/1222935860639563850/1252721001557528660/1252721001557528660
- This would be a great issue for a new contributor or OpenDevin to fix :)
IF neither the user nor OpenDevin can predict if/when any action would trigger a safety threshold, it makes it quite hard to judge what makes the effort worthwhile.
Personally, I've run Gemini with different sites/providers and not once got my input related to coding blocked. So either there might be some sort of pre-unlock happening on their backend (which is unknown to me), or if someone else managed to trigger a such a threshold, Germini might have been right. 🤣
In that vein, I'd prefer it as an opt-in option for the user, i.e. via the usual config values (toml and/or LITELLM_ env vars etc.), along with appropriate documentation. But for safety concerns not as a default "unlock" as not only adults could use OpenDevin.
I insert this part of code into the llm.py. Is that not enough?
if 'gemini' in self.model_name:
safety_settings = [
{
'category': 'HARM_CATEGORY_HARASSMENT',
'threshold': 'BLOCK_NONE',
},
{
'category': 'HARM_CATEGORY_HATE_SPEECH',
'threshold': 'BLOCK_NONE',
},
{
'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT',
'threshold': 'BLOCK_NONE',
},
{
'category': 'HARM_CATEGORY_DANGEROUS_CONTENT',
'threshold': 'BLOCK_NONE',
},
]
extra_kwargs = {
'safety_settings': safety_settings,
}
if 'gemini' in self.model_name:
self._completion = partial(
litellm_completion,
model=self.model_name,
api_key=self.api_key,
base_url=self.base_url,
api_version=self.api_version,
custom_llm_provider=custom_llm_provider,
max_tokens=self.max_output_tokens,
timeout=self.llm_timeout,
temperature=llm_temperature,
top_p=llm_top_p,
**extra_kwargs,
)
else:
self._completion = partial(
litellm_completion,
model=self.model_name,
api_key=self.api_key,
base_url=self.base_url,
api_version=self.api_version,
custom_llm_provider=custom_llm_provider,
max_tokens=self.max_output_tokens,
timeout=self.llm_timeout,
temperature=llm_temperature,
top_p=llm_top_p,
)
I personally cannot reproduce the high resolved rate that is reported by Deepseek-coder-instruct-v2 even with this safety setting. Is there any insight on this?
@lwaekfjlk , thanks! Your code looks good, but it'd be nice to avoid the two self._completion() calls, duplicated code is a source of errors.
I personally cannot reproduce the high resolved rate that is reported by Deepseek-coder-instruct-v2 even with this safety setting. Is there any insight on this?
Maybe we could open a new "benchmarking deepseek-v2" issue where you can raise this point?
yes,sure. For the code, it just simple dirty code for my personal use right now. I will create a PR for the current issue.
Just for reference, did you also notice lots of Gemini refusals?
I do notice gemini suddenly stops and thinks it is finished after a few turns with such safety setting (actually there is no patch generated). I haven't checked the refusals yet.
This is the visualization result of the trajectory I collected from gemini-1.5-pro.
Theoretically, I will not face any refusal from Gemini after settting such safety settings, right?
Interesting! Gemini won't explicitly filter out your requests, but the LM itself might still refuse due to alignment.
But yeah, I think that we should probably have a mechanism for removing the safety filters and instead rely on the Gemini model itself's alignment training.
I do notice gemini suddenly stops and thinks it is finished after a few turns with such safety setting (actually there is no patch generated). I haven't checked the refusals yet.
Oh, that is interesting: I never saw a refusal when using Gemini (an actual response saying so), but intermittent stopping mid-answer, even just after couple of lines. In some cases I could just message "continue" or I reworded my request. Wasn't aware this might be related to this topic.
I think OpenDevin should include an agent to check and filter out any harmful prompts. As responsible AI progresses, models will generally incorporate their own checks and filters. Therefore, in the future, other models might also refuse to generate code.
I think OpenDevin should include an agent to check and filter out any harmful prompts. As responsible AI progresses, models will generally incorporate their own checks and filters. Therefore, in the future, other models might also refuse to generate code.
I don't see OpenDevin being in the task to do such thing, same as an email program or text editor won't prevent you from typing up and sending out "harmful" content. If LLM's object to simple coding tasks, it is an error the vendor needs to fix by better training (data) and I doubt any vendor is actively working to make things worse as your "Therefore" implies.
Forgot to mention, the LLM is one level where moderation may take place. But providers of models already can/have their own mechnisms for that and for some models offer an explicit "self-moderated" access to an LLM, as an alternative.
Here is my experience: On the first day that I tried using Gemini 1.5 Flash from OpenDevin, it refused a very benign prompt ('list the folders inside this folder'), saying that the risk of dangerous content was medium. I addressed this (successfully, so far) in a similar way to what lwaekfjlk describes. (That is to say, the language model did not produce a response. Instead, the code raised an exception.)
Yeah, I think the gemini default filtering settings are overboard, and really malicious things will be stopped by the LM itself anyway. It'd be great to have a PR to address this.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.