interpretable-robust-trees icon indicating copy to clipboard operation
interpretable-robust-trees copied to clipboard

Connecting Interpretability and Robustness in Decision Trees through Separation

Connecting Interpretability and Robustness in Decision Trees through Separation

This repository contains the code of the experiments in the paper

Connecting Interpretability and Robustness in Decision Trees through Separation

Authors: Michal Moshkovitz, Yao-Yuan Yang, Kamalika Chaudhuri

Abstract

Recent research has recognized interpretability and robustness as essential properties of trustworthy classification. Curiously, a connection between robustness and interpretability was empirically observed, but the theoretical reasoning behind it remained elusive. In this paper, we rigorously investigate this connection. Specifically, we focus on interpretation using decision trees and robustness to $l_\infty$-perturbation. Previous works defined the notion of $r$-separation as a sufficient condition for robustness. We prove upper and lower bounds on the tree size in case the data is $r$-separated. We then show that a tighter bound on the size is possible when the data is linearly separated. We provide the first algorithm with provable guarantees both on robustness, interpretability, and accuracy in the context of decision trees. Experiments confirm that our algorithm yields classifiers that are both interpretable and robust and have high accuracy.

Installation

pip install -r requirements.txt

Install LP, QP Solvers

  • Install gurobi: https://www.cvxpy.org/install/index.html#install-with-gurobi-support
  • Install GLPK: https://www.gnu.org/software/glpk/
  • Install CVXOPT with GLPK support:
CVXOPT_BUILD_GLPK=1
CVXOPT_GLPK_LIB_DIR=/path/to/glpk-X.X/lib
CVXOPT_GLPK_INC_DIR=/path/to/glpk-X.X/include
pip install --upgrade cvxopt

submodules

git submodule init
git submodule update

Install LCPA

cd risk-slim
pip install -r requirements.txt
pip install .

For more LCPA installation instructions, please visit https://github.com/ustunb/risk-slim

Install robust decision tree (RobDT)

cd RobustTrees
git submodule update --init --recursive
./build.sh
cd python-package
python setup.py install

For more RobDT installation instructions, please visit https://github.com/chenhongge/RobustTrees

Scripts

Dataset process scripts

  • scripts/proc_bank.py
  • scripts/proc_data.py
  • scripts/proc_heart.py
  • scripts/proc_svmlight.py

Experiment scripts

  • params.py: listed all parameters run
  • rsep_explain/datasets/init.py: load datasets
  • experiments/lin_sep_bbm_rob_3.py: run experiment for BBM-RS
  • experiments/dt_interpret_rob_3.py: run experiment for DT (Breiman et al., 1984)
  • experiments/xgboostrobdt_interpret_rob.py: run experiment for RobDT (Chen et al., 2019)
  • experiments/risk_slim_3.py: run experiment for LCAP (Ustun & Rudin, 2019)
  • experiments/calc_lin_separation.py: estimating the linear separateness of each dataset
  • experiments/calc_separation.py: estimating the $r$- separateness of each dataset

Figure/Table generation scripts

  • notebooks/case_study.ipynb: generate Table 1
  • notebooks/separation.ipynb: generate Table 2
  • notebooks/risk_score_3.ipynb: generate Table 3 and 4
  • notebooks/tradeoff.ipynb: generate images in Figure 1 and 4
  • notebooks/plot_bbm.ipynb: generate images in Figure 5

Parameters

usage: main.py [-h] [--no-hooks] --experiment
               {lin_sep_bbm_rob_3,risk_slim_3,dt_interpret_rob_3,xgboostrobdt_interpret_rob,calc_lin_separation,calc_separation}
               --dataset DATASET --preprocessor PREPROCESSOR --random_seed
               RANDOM_SEED --rsep RSEP

Datasets: {risk_ionosphere, risk_diabetes, risk_breastcancer, risk_adult, risk_mushroom, risk_mammo, risk_spambase, risk_bank, risk_careval, risk_compasbin, risk_ficobin, risk_bank_2, risk_heart}

Algorithm implementations

  • Boosting by majority (BBM)

Examples

The result of each example is outputed as a joblib pickle file named temp.pkl.

Run BBM-RS with $\tau = 0.05$ on the bank dataset.

python main.py --no-hooks --experiment lin_sep_bbm_rob_3 \
  --dataset risk_bank --preprocessor rminmax \
  --rsep 0.05 \
  --random_seed 0

Run RobDT with robust radius $ = 0.1$ on the mammo dataset.

python main.py --no-hooks --experiment xgboostrobdt_interpret_rob \
  --dataset risk_mammo --preprocessor rminmax \
  --rsep 0.1 \
  --random_seed 0

Run LCAP on the mammo dataset.

python main.py --no-hooks --experiment risk_slim_3 \
  --dataset risk_mammo --preprocessor rminmax \
  --random_seed 0

Run DT on the heart dataset.

python main.py --no-hooks --experiment dt_interpret_rob_3 \
  --dataset risk_heart --preprocessor rminmax \
  --random_seed 0

Calculate the r-separateness of the heart dataset.

python main.py --no-hooks --experiment calc_separation \
  --dataset risk_heart --preprocessor rminmax \
  --random_seed 0

Calculate the linear separateness of the heart dataset.

python main.py --no-hooks --experiment calc_lin_separation \
  --dataset risk_heart --preprocessor rminmax \
  --random_seed 0