auto-evaluator
auto-evaluator copied to clipboard
Auto-evaluator :brain: :memo:
This is a lightweight evaluation tool for question-answering using Langchain to:
-
Ask the user to input a set of documents of interest
-
Use an LLM (
GPT-3.5-turbo) to auto-generatequestion``answerpairs from these docs -
Generate a question-answering chain with a specified set of UI-chosen configurations
-
Use the chain to generate a response to each
question -
Use an LLM (
GPT-3.5-turbo) to score the response relative to theanswer -
Explore scoring across various chain configurations
Run as Streamlit app
pip install -r requirements.txt
streamlit run auto-evaluator.py
Inputs
num_eval_questions - Number of question to auto-generate (if the user does not supply an eval set)
split_method - Method for text splitting
chunk_chars - Chunk size for text splitting
overlap - Chunk overlap for text splitting
embeddings - Embedding method for chunks
retriever_type - Chunk retrival method
num_neighbors - Neighbors in retrivial
model - LLM for summarization from retrived chunks
grade_prompt - Promp choice for grading
Blog
https://blog.langchain.dev/auto-eval-of-question-answering-tasks/
UI

Disclaimer
You will need an OpenAI API key with with access to `GPT-4` and an Anthropic API key to take advantage of all of the default dashboard model settings. However, additional models (e.g., from Hugging Face) can be easily added to the app.