openfold icon indicating copy to clipboard operation
openfold copied to clipboard

OpenFold Local Jupyter Notebook 📔 | Metrics, Plots, Concurrent Inference

Open juliocesar-io opened this issue 5 months ago • 0 comments

Overview

This PR introduces a fully featured Local Notebook for performing inference, obtaining metrics, ranking the best model, and generating plots in a structured and reproducible manner, particularly for experimentation with large datasets.

The metrics are similar to those in the Colab notebook but optimized for a local installation with Docker. It also introduces parallel execution to leverage multiple GPUs.

The notebook operates by executing Docker commands using the Docker client and accessing OpenFold functions within a standalone environment. This approach ensures that the OpenFold codebase remains unaffected, serving as a client to help reproduce metrics and results from the Colab notebook locally.

Usage

Refer to instructions in notebooks/OpenFoldLocal.ipynb

Setup the notebook

Fist, build Openfold using Docker. Follow this guide.

Then, go to the notebook folder

cd notebooks

Create an environment to run Jupyter with the requirements

mamba create -n openfold_notebook python==3.10

Activate the environment

mamba activate openfold_notebook

Install the requirements

pip install -r src/requirements.txt

Start your Jupyter server in the current folder

jupyter lab . --ip="0.0.0.0"

Access the notebook URL or connect remotely using VSCode.

Inference example

Initializing the client:

import docker
from src.inference import InferenceClientOpenFold

# You can also use a remote docker server 
docker_client = docker.from_env()

# Initialize the OpenFold Docker client setting the database path 

databases_dir = "/path/to/databases"

openfold_client = InferenceClientOpenFold(databases_dir, docker_client)

Running Inference:

# For multiple sequences, separate sequences with a colon `:`
input_string = "DAGAQGAAIGSPGVLSGNVVQVPVHVPVNVCGNTVSVIGLLNPAFGNTCVNA:AGETGRTGVLVTSSATNDGDSGWGRFAG"

model_name = "multimer" # or "monomer"
weight_set = 'AlphaFold' # or 'OpenFold'

# Run inference
run_id = openfold_client.run_inference(weight_set, model_name, inference_input=input_string)

Using a file:

input_file = "/path/to/test.fasta"

run_id = openfold_client.run_inference(weight_set, model_name, inference_input=input_file)

Screenshots

Screenshot 2024-08-27 at 6 54 49 PM Screenshot 2024-08-27 at 6 54 17 PM

juliocesar-io avatar Aug 28 '24 00:08 juliocesar-io