prompttools icon indicating copy to clipboard operation
prompttools copied to clipboard

Add support for other models in AutoEval

Open NivekT opened this issue 1 year ago • 12 comments

🚀 The feature

This is a good task for a new contributor

We have a few utility functions to perform AutoEval:

https://github.com/hegelai/prompttools/blob/main/prompttools/utils/autoeval.py https://github.com/hegelai/prompttools/blob/main/prompttools/utils/autoeval_scoring.py https://github.com/hegelai/prompttools/blob/main/prompttools/utils/expected.py

Currently, they tend to only support one model each. Someone can re-factor the code for each of them to support multiple models. I would recommend making sure they all support for the best known models such as GPT-4 and Claude 2.

We can even consider LlaMA but that is less urgent.

Tasks

  • [ ] Update this file such that autoeval_binary_scoring can take in model as an argument. Let's make sure gpt-4 and claude-2 are both accepted and invoke the right completion function.
  • [ ] Same as the above but for this file and the function autoeval_scoring. OpenAI needs to added here.
  • [ ] Same as the above but for this file and the function compute_similarity_against_model
  • [ ] Allow auto evaluation by multiple models (e.g. both gpt-4 and claude-2) at the same time

Motivation, pitch

Allow people to auto-evaluate with different best models would be ideal

Alternatives

No response

Additional context

No response

NivekT avatar Aug 01 '23 07:08 NivekT

@NivekT i think if we add this with #31 , it might be faster and we can build on top of Llama-index's evals.

What do you think?

rachittshah avatar Aug 01 '23 14:08 rachittshah

So we want to accept an array of models and evaluate against all of them, right?

divij9 avatar Aug 01 '23 14:08 divij9

@rachittshah We can consider add LlamaIndex's eval if it integrates well with the pattern we have here. Feel free to propose something and we can have a look.

@divij9 That can be part of it, but for each of the eval function linked above, they currently only support OpenAI or Anthropic.

I will update the main issue to break the request into pieces that are easier for first-time contributors to work on.

NivekT avatar Aug 02 '23 01:08 NivekT

I have updated the ask to be bite-size. Feel free to comment if anything is unclear!

NivekT avatar Aug 02 '23 01:08 NivekT

I think I understand what our goal is. Can you please assign this to me?

Divij97 avatar Aug 02 '23 19:08 Divij97

@Divij97 Sure! Let us know if you plan to work on all 4 subtasks or a specific one. Feel free pick whichever you think you can contribute to. Thanks!

NivekT avatar Aug 02 '23 19:08 NivekT

I'd love to help adding support for new models using https://github.com/BerriAI/litellm. Let me know if I can help out on this too

ishaan-jaff avatar Aug 02 '23 21:08 ishaan-jaff

@ishaan-jaff Awesome! Could you create an issue for it and I can assign that to you? I think the best approach would be to create a LitellmExperiment, you can follow some examples we've done for other APIs:

  • https://github.com/hegelai/prompttools/blob/main/prompttools/experiment/experiments/openai_chat_experiment.py
  • https://github.com/hegelai/prompttools/blob/main/prompttools/experiment/experiments/llama_cpp_experiment.py

steventkrawczyk avatar Aug 02 '23 22:08 steventkrawczyk

Hi @steventkrawczyk I have made the required changes but I am not able to push. I signed the cla but the push still returns a 403 for me

Divij97 avatar Aug 04 '23 19:08 Divij97

@Divij97 you will need to push to a fork and raise a PR from the fork to the original repo

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork

steventkrawczyk avatar Aug 04 '23 19:08 steventkrawczyk

Nevermind it was a key chain issue with my stupid mac. Can you please take a look at this PR: https://github.com/hegelai/prompttools/pull/59. It's not complete but I wanted to understand if I am headed in the right direction

Divij97 avatar Aug 04 '23 19:08 Divij97

Wanted to update about my progress. I am done with all the changes but wanted some help testing them out for Anthropic. How do I generate an Anthropic key to test?

Divij97 avatar Aug 08 '23 21:08 Divij97