Logical and Abstract Reasoning

Repository for the evaluation of Large Language Models on logical and abstract reasoning tasks

Installation

To install the repository, use the following command:

git clone https://github.com/Strong-AI-Lab/Logical-and-abstract-reasoning.git

To install the dependencies in a virtual environment, use the following:

cd Logical-and-abstract-reasoning
python -m venv env/
source env/bin/activate
pip install -r requirements.txt

You may need to install transformers from the repository:

pip install git+https://github.com/huggingface/transformers

Use

Evaluation

To evaluate a model in the repository, use the following command:

python run_evaluation config/model/<model_config.yaml> config/data/<data_config.yaml> --<kwarg_name> <kwarg>

You can choose the model to evaluate by changing the <model_config.yaml> file, and the dataset to evaluate the model on by changing the <data_config.yaml> file. You can add any additional arguments as <kwargs> (e.g. private API key for GPT models).

By default, all the results are saved in a csv file in the logs/ folder. You can re-compute the metrics from the evaluation run from this file by running the following:

python src/evaluate/evaluator.py logs/<results_file.csv>

Fine-tuning

To fine-tune a model on a given dataset, run the following:

python run_finetuning.py config/model/<model_config.yaml> config/data/<data_config.yaml> config/trainer/<trainer_config.yaml>

The configuration files work similarly as for evaluation. The <model_config.yaml> file contains additoinal configuration for training. The logs are saved in fine-tuning-output/ and the model weights are saved in fine-tuning-saves/.

Currently, only HuggingFace models can be fine-tuned.

LLaMA-based model instruction fine-tuning

We use the LLaMA-based model fine-tuning from the Stanford Alpaca training script. If you want to conduct a LLaMA-based model on instruction fine-tuning, you can do that by following this link.

Models

Inference Type	Model	Size	Task	Link	Remark
Logical Reasoning on Reading Comprehension	MERIt	-	Reading Comprehension	paper project	#3 on the ReClor leaderboard
	LReasoner	-	Reading Comprehension	paper project	#6 on the ReClor leaderboard
	AMR-LE	-	Reading Comprehension	project	#2 and #5 on the ReClor leaderboard
	LLaMA	-	Reading Comprehension	paper code	Open source very large language model
	LLaMA2	-	Reading Comprehension	paper code	Open source very large language model
	TinyLLaMA	-	Reading Comprehension	paper code	Open source very large language model
	Alpaca	-	Reading Comprehension	code	Fine-tuned LLaMA
	Vicuna	-	Reading Comprehension	project code	Fine-tuned LLaMA
	ChatGPT	-	Reading Comprehension	paper project	Use api to do prompt tuning
	GPT-4	-	Reading Comprehension	paper project	Use api to do prompt tuning
	Zephyr-7b-beta	-	Reading Comprehension	code	Fine-tuned Mistral-7b

Datasets & Benchmarks

Inference Type	Dataset	Size	Task	Link	Remark
Logical Reasoning on Reading Comprehension	ReClor	-	Reading Comprehension	paper project	Logical reasoning reading comprehension
	LogiQA	-	Reading Comprehension	paper project	Logical reasoning reading comprehension
	LogiQA V2	-	Reading Comprehension	project	Logical reasoning reading comprehension
	LogiQA Logical Reasoning Plus	-	Reading Comprehension	project	Logical reasoning reading comprehension for out-of-distribution evaluation
Abstract Reasoning	ARC	-	Abstract Reasoning	paper code	Text version of a Visual Abstract Reasoning task
	ACRE	-	Abstract Reasoning	paper code	Text version of a Visual Abstract Reasoning task
	PVR	-	Abstract Reasoning	paper	Abstract Reasoning task
	RAVEN	-	Abstract Reasoning	paper project	Text version of a Visual Abstract Reasoning task
	Diagrammatic Logic	-	Abstract Reasoning	code	Extracted from OpenAI Evals
	Logic	-	Abstract Reasoning	code	Extracted from OpenAI Evals
	Logic Statements	-	Abstract Reasoning	code	Extracted from OpenAI Evals
	Pattern Identification	-	Abstract Reasoning	code	Extracted from OpenAI Evals
	String Patterns	-	Abstract Reasoning	code	Extracted from OpenAI Evals
	List Functions	-	Abstract Reasoning	code	Extracted from Google BIG-bench

Acknowledgement

Our proposed new dataset logiqa-logical-reasoning-plus has been merged by OpenAI/Evals.

Logical-and-abstract-reasoning
Logical-and-abstract-reasoning copied to clipboard

Metadata

Logical and Abstract Reasoning

Installation

Use

Evaluation

Fine-tuning

LLaMA-based model instruction fine-tuning

Models

Datasets & Benchmarks

Acknowledgement

← Metadata

Owner

Metadata

Logical-and-abstract-reasoning Logical-and-abstract-reasoning copied to clipboard

Metadata

Logical and Abstract Reasoning

Installation

Use

Evaluation

Fine-tuning

LLaMA-based model instruction fine-tuning

Models

Datasets & Benchmarks

Acknowledgement

← Metadata

Owner

Metadata

Logical-and-abstract-reasoning
Logical-and-abstract-reasoning copied to clipboard