metk icon indicating copy to clipboard operation
metk copied to clipboard

Model Evaluation Toolkit

metk

Model Evaluation Toolkit ^^^^^^^^^^^^^^^^^^^^^^^^

| In metk, I've collected a set of routines for evaluating predictive models. | I put a lot of this code together when I was doing the evaluation for the | TDT <http://www.teach-discover-treat.org/>__ and D3R <https://drugdesigndata.org/>__ | projects, as well as | a book chapter I wrote in 2013 <http://onlinelibrary.wiley.com/doi/10.1002/9781118742785.ch1/summary>__. | I'm releasing this project as a way for the community to collaborate | and (hopefully) agree on best practices for model evaluation. Most of the | initial release is oriented toward the evaluation of free energy calculations.

| This is just a start and I plan to add a lot more. Currently, there are | routines to calculate

  • Root mean squared (RMS) error
  • Mean absolute error (MAE)
  • Pearson correlation coefficient (with confidence limits)
  • Spearman rank correlation (rho) (still need to add confidence limits)
  • Kendall tau (still need to add confience limits)
  • Maximum possible correlation given a specific experimental error. This is based on on a 2009 paper by Brown, Muchmore and Hajduk <http://www.sciencedirect.com/science/article/pii/S1359644609000403>__

| Most of the statistics is done with routines from scikitlearn <http://scikit-learn.org/stable/>__ | and scipy <https://www.scipy.org/>__.

| The toolkit also includes code to generate a few diagnositc plots that I | find helpful when looking at model performance. Examples of these plots can be found | here <https://figshare.com/articles/metk_out_pdf/5258080>__

  • A scatter plot of experimental vs predicted ΔG. Lines are drawn at 1 and 2 kcal error
  • A histogram of the error distribution.
  • The two plots above with ΔG converted to a binding affinity (in uM or nM). On the scatter plot, lines are drawn at 5-fold and 10-fold error. I find that I mentally relate to a fold error in binding affinity better than I do to error expressed in kcal/mol. However, if you like looking at error in kcal/mol, use that plot.

| Ultimately, the plan is to implement a number of other methods for model | evaluation including those described in papers by Anthony Nicholls.

Usage ^^^^^

| This relase of metk contains a rudimentary command-line interface. More options | will be added in time.

::

Usage: metk.py --in INFILE_NAME --prefix OUTFILE_PREFIX [--units UNIT_NAME] [--example]

--in INFILE_NAME         input file name
--prefix OUTFILE_PREFIX  prefix for output file names
--units UNIT_NAME        units to display (uM (default) or nM)
--example                show example command lines

Installation ^^^^^^^^^^^^

The toolkit works under both Python 2.7 and Python 3.6. Installation is relatively painless.

#. Install the dependencies, you can do this with pip

::

   pip install numpy pandas matplotlib scipy docopt

#. | Get the code from github. You can either download and unpack the zip file | or just clone the repository.

::

   git clone https://github.com/PatWalters/metk.git

#. | There's one more trick to make the plots work with matplotlib. When pip installed | matplotlib, it created a directory under your home directory called .matplotlib. Create | a file in this directory called matplotlibrc and put this line in that file.

::

backend: TkAgg

#. | At this point you should be all set. The main script is metk.py. The other Python | files need to either be in the same directory or in your PYTHONPATH. You can | then run the script with this command.

::

   python metk.py

| If you're running under Linux or OS-X and you hate typing "python" all the time | (I know I do) you can do

::

   chmod +x metk.py
   ./metk.py

A Few Notes ^^^^^^^^^^^

I use tabs <https://www.youtube.com/watch?v=SsoOG6ZeyUI>__

Please don't hesitate to let me know if you run into problems or have additions or improvements.

Pat Walters - July 2017