AIrXiv
AIrXiv copied to clipboard
AI-powered arXiv research assistant prototype.
AIrXiv
AIrXiv is a prototype for an LLM-powered ArXiv research assistant. It is an Electron app with a Flask backend powered by the OpenAI API. AIrXiv relies on a user-facing OpenAI API Key.

Contents
- Implementation Notes
- Installation
- Usage
- License
Implementation Notes
- AIrXiv attempts to extract the TeX source of the papers when they are added to the database. If this fails, PDF extraction is used.
- AIrXiv uses a simple top-k similarity search (
k = 3by default) from a FAISS vector store.kcan be set inconfig.yml, along with the chunk size and stride length of chunks (512 and 384 by default). - Frontend elements are in
static, and backend elements are inutilandmain.py.
Installation
Please open an Issue if you are having problems with installation.
- (Prerequisites) Make sure you have Node.js (LTS version recommended) and Python >=3.7.
- Clone and
cdinto the repo:
git clone https://github.com/smsharma/AIrXiv.git
cd AIrXiv
- Install the required Python packages using the provided environment.yml file. You can use Conda or any other environment manager of your choice:
conda env create -f environment.yml
conda activate airxiv
- Install the required Node.js packages by running the following command:
npm install
Usage
Run the Electron app with
npm run dev
which launches the Python/Flask backend (python main.py or npm run start-flask) as well as the frontend (npm start). If this fails, try running the two commands separately. Add an arXiv ID or two, enter your OpenAI API Key in the text box towards the bottom, and start asking questions!
Usage notes:
- Either
gpt-3.5-turboorgpt-4can be selected in the app settings.gpt-4is significantly better in particular at implementing code, but is about an order of magnitude more expensive (~$0.02/1000 tokens) comparedgpt-3.5-turbo, and additionally API access is subject to a waitlist. - Paper querying can be turned off by checking "Don't query papers" in the settings. This then simply relies on the general capabilities of the model.
- All output should be verified for integrity.
License
AIrXiv is licensed under the MIT License.