scikit-mdr icon indicating copy to clipboard operation
scikit-mdr copied to clipboard

Example for finding features with epistatic effects with scikit-mdr

Open weixuanfu opened this issue 6 years ago • 1 comments

It seems that the utilities in mdr.utils is designed for this purpose but there is no documentation about how to use them. I have a quick look into those codes and made the demo for calculating scores for n-way combinations and I think it maybe a way to finding feature combinations with epistatic effect. Please let me know if it is the correct way.

from mdr import MDRClassifier
import pandas as pd
from mdr.utils import n_way_models
import operator

genetic_data = pd.read_csv('https://github.com/EpistasisLab/scikit-mdr/raw/development/data/GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1.tsv.gz', sep='\t', compression='gzip')

features = genetic_data.drop('class', axis=1).values
labels = genetic_data['class'].values
feature_names = list(genetic_data.columns)

my_mdr = MDRClassifier()
my_mdr.fit(features, labels)
print("Score for using all features", my_mdr.score(features, labels))

#n: list (default: [2])
#The maximum size(s) of the MDR model to generate.
#e.g., if n == [3], all 3-way models will be generated.
n = [2]
mdr_score_list = []
#  Note that this function performs an exhaustive search through all feature combinations and can be computationally expensive.
for _, mdr_model_score, model_features in n_way_models(my_mdr, features, labels, n=n, feature_names=feature_names):
    mdr_score_list.append((model_features, mdr_model_score))
mdr_score_list.sort(key=operator.itemgetter(1), reverse=True)
print("The combination with highest score:", mdr_score_list[0])

Exported output:

Score for using all features 0.998125
The combination with highest score: (['P1', 'P2'], 0.793125)

weixuanfu avatar Sep 25 '18 20:09 weixuanfu

The test code worked for me and did what I had wanted to do. Thanks!

amyxlu avatar Sep 27 '18 18:09 amyxlu