CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.

🚀 Get Started

Dataset

The hand curated version of the dataset can be found on citeme.ai.
It contains following columns:

id: A unique id that is used in all our experiments to reference a specific paper.
excerpt: The text excerpt describing the target paper.
target_paper_title: The title of the paper described by the excerpt.
target_paper_url: The URL to the paper described by the excerpt.
source_paper_title: The title of the paper the excerpt was taken from.
source_paper_url: The URL to the paper the excerpt was taken from.
year: The year the source paper was published.
split: Indicates if the sample is from the train or test split.

CiteAgent

Environment variables

CiteAgent requires following environment variables to function properly:

S2_API_KEY: Your semantic scholar api key
OPENAI_API_KEY: Your openai api key (for gpt-4 models)
ANTHROPIC_API_KEY: Your anthropic api key (for claude models)
TOGETHER_API_KEY: Your together api key (for llama models)

Run

Install the required python packages listed in the pyproject.toml.

# via uv (recommended)
uv sync

# or via pip
pip install pyproject.toml

Download the dataset from citeme.ai and place it in the project folder as DATASET.csv.
Run the main.py file.
```
python src/main.py
```

Configuration

To modify the run parameters open src/main.py and update the metadata dict.

To run different models adjust the model entry (e.g. gpt-4o, claude-3-opus-20240229 or meta-llama/Llama-3-70b-chat-hf).

To run the agent without actions change the executor from LLMSelfAskAgentPydantic to LLMNoSearch and adjust the prompt_name to a *_no_search prompt.

📚Citation

If you find our work helpful, please use the following citation:

@inproceedings{press2024citeme,
  title={Cite{ME}: Can Language Models Accurately Cite Scientific Claims?},
  author={Press, Ori and Hochlehnert, Andreas and Prabhu, Ameya and Udandarao, Vishaal and Press, Ofir and Bethge, Matthias},
  booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2024}
}

🪪 License

Code: MIT. Check LICENSE. Dataset: CC-BY-4.0. Check LICENSE_DATASET.

CiteME
CiteME copied to clipboard

Metadata

🚀 Get Started

Dataset

CiteAgent

Environment variables

Run

Configuration

📚Citation

If you find our work helpful, please use the following citation:

🪪 License

← Metadata

Owner

Metadata

CiteME CiteME copied to clipboard

Metadata

🚀 Get Started

Dataset

CiteAgent

Environment variables

Run

Configuration

📚Citation

If you find our work helpful, please use the following citation:

🪪 License

← Metadata

Owner

Metadata

CiteME
CiteME copied to clipboard