cord19q
cord19q copied to clipboard
COVID-19 Open Research Dataset (CORD-19) Analysis
cord19q: COVID-19 Open Research Dataset (CORD-19) Analysis
COVID-19 Open Research Dataset (CORD-19) is a free resource of scholarly articles, aggregated by a coalition of leading research groups, covering COVID-19 and the coronavirus family of viruses. The dataset can be found on Semantic Scholar and Kaggle.
The cord19q project builds an index over the CORD-19 dataset to assist with analysis and data discovery. A series of COVID-19 related research topics were explored to identify relevant articles and help find answers to key scientific questions.
Tasks
A full list of Kaggle CORD-19 Challenge tasks can be found in this notebook. This notebook and corresponding report notebooks won 🏆 7 awards 🏆 in the Kaggle CORD-19 Challenge.
The latest tasks are also stored in the cord19q repository.
Installation
cord19q can be installed directly from GitHub using pip. Using a Python Virtual Environment is recommended.
pip install git+https://github.com/neuml/cord19q
Python 3.6+ is supported
Building a model
cord19q relies on paperetl to parse and load the CORD-19 dataset into a SQLite database. paperai is then used to run an AI-Powered Literature Review over the CORD-19 dataset for a list of query tasks.
The following links show how to parse, load and index CORD-19.
The model will be stored in ~/.cord19
Building a report file
A report file is simply a markdown file created from a list of queries. An example:
python -m paperai.report tasks/risk-factors.yml
Once complete a file named tasks/risk-factors.md will be created.
Running queries
The fastest way to run queries is to start a paperai shell
paperai
A prompt will come up. Queries can be typed directly into the console.
Related Efforts
The following is a list of related efforts built off this repository.
- COVID-19 Dataset Search (Credit: Ankur Mohan). Thank you to Ankur for sharing and putting together comprehensive documentation on how cord19q works!