bugbug [code_review] Explore different models

Similar to #4582, but across different models.

This depends on #4580 for the evaluation.

Nov 01 '24 10:11 marco-c

We have this as part of the experimental mode. We return the results generated by each of the models/configurations to the user for evaluation.

The following are the configurations in the experimental mode:

gpt-4o temp 0.2
gpt-4o temp 0.8
claude-3-5 temp 0.2
~gemini-1.5-pro temp 0.2~ (disabled due to a quota limitation error)

Nov 27 '24 01:11 suhaibmujahid

I deployed a new version of review helper, which enables back gemini-1.5-pro.

So currently, the following are the configurations in the experimental mode:

gpt-4o temp 0.2
gpt-4o temp 0.8
claude-3-5 temp 0.2
gemini-1.5-pro temp 0.2

Jan 08 '25 15:01 suhaibmujahid

I deployed a new version of review helper, which enables back gemini-1.5-pro.

So currently, the following are the configurations in the experimental mode:
* gpt-4o temp 0.2

* gpt-4o temp 0.8

* claude-3-5 temp 0.2

* gemini-1.5-pro temp 0.2

And after https://github.com/mozilla/bugbug/pull/4731, it's going to be Gemini 2.0 Flash instead of Gemini 1.5 Pro.

Jan 08 '25 15:01 marco-c

@suhaibmujahid could you share the latest stats here, and the query you're using to get them (so I can run it myself easily as well)?

Apr 22 '25 12:04 marco-c