llms-benchmarking topic
List
llms-benchmarking repositories
cc_flows
30
Stars
1
Forks
Watchers
The data and implementation for the experiments in the paper "Flows: Building Blocks of Reasoning and Collaborating AI".
ChemLLMBench
95
Stars
5
Forks
Watchers
What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks
resta
20
Stars
1
Forks
Watchers
Restore safety in fine-tuned language models through task arithmetic
parea-sdk-py
41
Stars
4
Forks
Watchers
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)