llms-benchmarking topic

List llms-benchmarking repositories

cc_flows

30
Stars
1
Forks
Watchers

The data and implementation for the experiments in the paper "Flows: Building Blocks of Reasoning and Collaborating AI".

ChemLLMBench

95
Stars
5
Forks
Watchers

What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks

chem-bench

31
Stars
1
Forks
Watchers

How good are LLMs at chemistry?

resta

20
Stars
1
Forks
Watchers

Restore safety in fine-tuned language models through task arithmetic

parea-sdk-py

41
Stars
4
Forks
Watchers

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)