Envision2017 icon indicating copy to clipboard operation
Envision2017 copied to clipboard

We present Envision, an accurate predictor of protein variant molecular effect, trained using large-scale experimental mutagenesis data. All data and software in this study are freely available. The t...

Our code is separated into five Jupyter Notebook files (.ipynb) and one R Markown file.

The Jupyter Notebooks contain the following:

  • singleProteinModels.ipynb -- code for tuning hyperparameters and training models using the 8 protein data sets individually.

  • envisionTuneTrainPredict.ipynb -- code to tune hyperparameters and train Envision with all eight data sets

  • LOPOTuneTrain.ipynb -- train each leave-one-protein-out (LOPO) model to predict the protein data set not used in training.

  • LOPO_10xCV.ipynb -- tune using tenfold cross-validation, train each leave-one-protein-out (LOPO) model to predict the protein data set not used in training.

  • LOPO_predict_missingFeatureMuts.ipynb -- use each leave-one-protein-out (LOPO) model to predict the protein data set not used in training with missing features.

  • LOPO_unnormalized.ipynb -- train each leave-one-protein-out (LOPO) model with unnormalized data and then predict protein data sets not used in training.

  • downSamplingAnalysis.ipynb -- code to sample 6, 4,and 2 proteins as training data for model training

  • Clinvar_analysis.ipynb -- use Envision to predict Clinvar mutations


The R Markdown contains the following:

  • envision_figure_code.Rmd -- code for generating manuscript figures.

Notes:

  • All necessary data files can be found in /data directory.

  • Graphlab and Python dependencies (e.g. Numpy) are required to successfully run all .ipynb code.

  • All code will be deposited in a public GitHub repository upon publication