prompttools
prompttools copied to clipboard
Add support for other models in AutoEval
🚀 The feature
This is a good task for a new contributor
We have a few utility functions to perform AutoEval:
https://github.com/hegelai/prompttools/blob/main/prompttools/utils/autoeval.py https://github.com/hegelai/prompttools/blob/main/prompttools/utils/autoeval_scoring.py https://github.com/hegelai/prompttools/blob/main/prompttools/utils/expected.py
Currently, they tend to only support one model each. Someone can re-factor the code for each of them to support multiple models. I would recommend making sure they all support for the best known models such as GPT-4 and Claude 2.
We can even consider LlaMA but that is less urgent.
Tasks
- [ ] Update this file such that
autoeval_binary_scoring
can take inmodel
as an argument. Let's make suregpt-4
andclaude-2
are both accepted and invoke the right completion function. - [ ] Same as the above but for this file and the function
autoeval_scoring
. OpenAI needs to added here. - [ ] Same as the above but for this file and the function
compute_similarity_against_model
- [ ] Allow auto evaluation by multiple models (e.g. both
gpt-4
andclaude-2
) at the same time
Motivation, pitch
Allow people to auto-evaluate with different best models would be ideal
Alternatives
No response
Additional context
No response
@NivekT i think if we add this with #31 , it might be faster and we can build on top of Llama-index's evals.
What do you think?
So we want to accept an array of models and evaluate against all of them, right?
@rachittshah We can consider add LlamaIndex's eval if it integrates well with the pattern we have here. Feel free to propose something and we can have a look.
@divij9 That can be part of it, but for each of the eval function linked above, they currently only support OpenAI or Anthropic.
I will update the main issue to break the request into pieces that are easier for first-time contributors to work on.
I have updated the ask to be bite-size. Feel free to comment if anything is unclear!
I think I understand what our goal is. Can you please assign this to me?
@Divij97 Sure! Let us know if you plan to work on all 4 subtasks or a specific one. Feel free pick whichever you think you can contribute to. Thanks!
I'd love to help adding support for new models using https://github.com/BerriAI/litellm. Let me know if I can help out on this too
@ishaan-jaff Awesome! Could you create an issue for it and I can assign that to you? I think the best approach would be to create a LitellmExperiment
, you can follow some examples we've done for other APIs:
- https://github.com/hegelai/prompttools/blob/main/prompttools/experiment/experiments/openai_chat_experiment.py
- https://github.com/hegelai/prompttools/blob/main/prompttools/experiment/experiments/llama_cpp_experiment.py
Hi @steventkrawczyk I have made the required changes but I am not able to push. I signed the cla but the push still returns a 403 for me
@Divij97 you will need to push to a fork and raise a PR from the fork to the original repo
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork
Nevermind it was a key chain issue with my stupid mac. Can you please take a look at this PR: https://github.com/hegelai/prompttools/pull/59. It's not complete but I wanted to understand if I am headed in the right direction
Wanted to update about my progress. I am done with all the changes but wanted some help testing them out for Anthropic. How do I generate an Anthropic key to test?