mongo-rdkit
mongo-rdkit copied to clipboard
GSoC 2020 project to integrate the RDKit and MongoDb
mongo-rdkit
Mongo-rdkit is an integration between MongoDB, a NoSQL database platform, and RDKit, a collection of cheminformatics and machine-learning software. This package contains tools to create and manipulate a chemically-intelligent database, as well as methods for high-performance searches on the database that leverage native MongoDB features.
Useful links:
- BSD License - a business friendly license for open-source.
- Jupyter Notebooks - walkthroughs for main functionality.
- Testing Guide - walkthrough of running
mongordkittests.
Documentation
Jupyter Notebooks and resources for getting started in the docs folder on GitHub.
Installation
macOS and Linux:
Ensure that you have either Anaconda or Miniconda installed and that conda has been added to PATH.
Clone the repository into your desired directory.
Navigate so that your current working directory is mongo-rdkit.
Create a conda environment called mongo_rdkit that includes all dependencies needed for this package:
conda env create --quiet --force --file env.yml
Activate said conda environment:
source activate mongo_rdkit
Install a local copy of mongo-rdkit by running this from the same directory as setup.py (mongo-rdkit is not yet published to PyPI):
pip install -e .
You can now import mongordkit in your Python interpreter or run all tests using the pytest command.
Windows:
Similarly, ensure that conda has been added to PATH.
Clone the repository into your desired directory and navigate into it.
Create a conda environment called mongo_rdkit that includes dependencies:
conda env create --quiet --force --file env.yml
Activate this conda environment:
call activate mongo_rdkit
Check that you are able to import mongordkit:
python -c "import mongordkit"
If this fails, you may need to add the current directory manually to PYTHONPATH:
set PYTHONPATH=%PYTHONPATH%;C:.
You can now use mongordkit in your interpreter and run tests using python -m pytest.
Package Contents
Modules
mongordkit contains two main modules, each of which contains a variety of importable methods and classes.
Database contains functionality for writing and registering data. Search contains functionality for setting up and performing
substructure and similarity search. Detailed walkthroughs can be found in the notebooks, listed below.
Notebooks
- Creating and Writing to MongoDB: documentation and demos for creating and modifying mongo-rdkit databases.
- Similarity and Substructure Search: documentation and demos for similarity and substructure search.
- Similarity Benchmarking: documentation for reproducing similarity benchmarking.
- Substructure Benchmarking: documentation for reproducing substructure benchmarking.
Configuration
- azure_pipelines.yml: CI/CD pipeline configurations.
- conftest.py:
pytestconfigurations. - env.yml: required dependencies.
- setup.py: python package setup including pip dependencies
License
Code released under the BSD License.