fm-leaderboarder
fm-leaderboarder copied to clipboard
FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts
FM-Leaderboard-er
Create your own private LLM leaderboard! 📊
data:image/s3,"s3://crabby-images/4f68a/4f68a42ab40a848db4e77c89595f3c9e5fd64516" alt=""
Introduction
There's no one-fit-all leaderboard. FM-Leaderboard-er
will allow you to find the best LLM for your own business use case based on your own tasks, prompts, and data.
Features:
- Tasks - Example notebooks for common tasks like Summarization, Classification, and RAG (coming soon).
- Models - Amazon Bedrock, OpenAI, any API (with a code integration).
- Metrics - Built-in metrics per task + custom metrics (via a code integration).
- Latency - Latency metric per model
- Cost - comparison.
- Prompt - You could compare several prompts across one model
Getting Started
Prerequisits
- AWS account with Amazon Bedrock access to selected models.
- Hugging Face access token
The code will download Dataset from Huggingface (
https://huggingface.co/api/datasets/Salesforce/dialogstudio
), this will require an access token, if you don't have one yet, follow these steps:
- Signup to Hugging Face:
https://huggingface.co
- Generate an access token (save it for further use):
https://huggingface.co/settings/tokens
- Store the access token localy, by installing python lib huggingface_hub and execute from shell:
> pip install huggingface_hub > python -c "from huggingface_hub.hf_api import HfFolder; HfFolder.save_token('YOUR_HUGGINGFACE_TOKEN')"
(Verify you now have: ~/.cache/huggingface
)
Installation
- Clone the repository:
git clone https://github.com/aws-samples/fm-leaderboarder.git
Usage
To get started, open the example-1 notebook and follow the instructions provided.
Architecture
Coming soon.
Dependency on third party libraries and services
This code can interact with the OpenAI service which has terms published here and pricing described here. You should be familiar with the pricing and confirm that your use case complies with the terms before proceeding.
This repository makes use of aws/fmeval Foundation Model Evaluations Library. Please review any license terms applicable to the dataset with your legal team and confirm that your use case complies with the terms before proceeding.
Security
See CONTRIBUTING for more information.
Contributing
Contributions to FM-Leaderboarder are welcome! Please refer to the CONTRIBUTING.md file for guidelines on how to contribute.
Contributors
License
This project is licensed under the Apache-2.0 License.