scikit-mdr
scikit-mdr copied to clipboard
Example for finding features with epistatic effects with scikit-mdr
It seems that the utilities in mdr.utils
is designed for this purpose but there is no documentation about how to use them. I have a quick look into those codes and made the demo for calculating scores for n-way combinations and I think it maybe a way to finding feature combinations with epistatic effect. Please let me know if it is the correct way.
from mdr import MDRClassifier
import pandas as pd
from mdr.utils import n_way_models
import operator
genetic_data = pd.read_csv('https://github.com/EpistasisLab/scikit-mdr/raw/development/data/GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1.tsv.gz', sep='\t', compression='gzip')
features = genetic_data.drop('class', axis=1).values
labels = genetic_data['class'].values
feature_names = list(genetic_data.columns)
my_mdr = MDRClassifier()
my_mdr.fit(features, labels)
print("Score for using all features", my_mdr.score(features, labels))
#n: list (default: [2])
#The maximum size(s) of the MDR model to generate.
#e.g., if n == [3], all 3-way models will be generated.
n = [2]
mdr_score_list = []
# Note that this function performs an exhaustive search through all feature combinations and can be computationally expensive.
for _, mdr_model_score, model_features in n_way_models(my_mdr, features, labels, n=n, feature_names=feature_names):
mdr_score_list.append((model_features, mdr_model_score))
mdr_score_list.sort(key=operator.itemgetter(1), reverse=True)
print("The combination with highest score:", mdr_score_list[0])
Exported output:
Score for using all features 0.998125
The combination with highest score: (['P1', 'P2'], 0.793125)
The test code worked for me and did what I had wanted to do. Thanks!