Steven Krawczyk (Hegel AI)
Steven Krawczyk (Hegel AI)
Hey @rachittshah we'd love your help! I think you're right that we'll need to build prompttools abstractions on top of the llamaindex abstractions. I'm hoping there are a few abstractions...
Thanks for bringing this issue up, I'm looking into it
Which model in particular are you trying to use? The huggingface hub experiment uses the Inference API, so it will only support models which that API supports. Is there a...
Have you downloaded the model `llama-2-7b-chat.ggmlv3.q2_K.bin` and followed the setup instructions at https://github.com/ggerganov/llama.cpp and https://github.com/abetlen/llama-cpp-python?
Thanks for raising this @RigvedRocks! I'll review it now. When you get a chance, could you fill out the CLA?
I'm not very familiar with promptbench but it looks like you want to run attacks as experiments and use your eval function to evaluate the experiment outputs. You can check...
Hey @Divij97 this looks great! Very elegant way to support Anthropic + OpenAI as evaluators. I'm guessing Claude and GPT will need different eval prompts, but this is definitely headed...
@LuvvAggarwal Sure thing. The scope of this one is a bit large because we currently don't have any common benchmarks. I think a simple case would be the following *...
@LuvvAggarwal using datasets sounds like a good start. As far as using evaluate, we want to write our own eval methods that support more than just huggingface (e.g. OpenAI, Anthropic)
For example, if you are using the hellaswag dataset, we need to compute the accuracy of predictions, e.g. https://github.com/openai/evals/blob/main/evals/metrics.py#L12