[code_review] Explore different models
Similar to #4582, but across different models.
This depends on #4580 for the evaluation.
We have this as part of the experimental mode. We return the results generated by each of the models/configurations to the user for evaluation.
The following are the configurations in the experimental mode:
- gpt-4o temp 0.2
- gpt-4o temp 0.8
- claude-3-5 temp 0.2
- ~gemini-1.5-pro temp 0.2~ (disabled due to a quota limitation error)
I deployed a new version of review helper, which enables back gemini-1.5-pro.
So currently, the following are the configurations in the experimental mode:
- gpt-4o temp 0.2
- gpt-4o temp 0.8
- claude-3-5 temp 0.2
- gemini-1.5-pro temp 0.2
I deployed a new version of review helper, which enables back gemini-1.5-pro.
So currently, the following are the configurations in the experimental mode:
* gpt-4o temp 0.2 * gpt-4o temp 0.8 * claude-3-5 temp 0.2 * gemini-1.5-pro temp 0.2
And after https://github.com/mozilla/bugbug/pull/4731, it's going to be Gemini 2.0 Flash instead of Gemini 1.5 Pro.
@suhaibmujahid could you share the latest stats here, and the query you're using to get them (so I can run it myself easily as well)?