InterpretableMachineLearning2020
InterpretableMachineLearning2020 copied to clipboard
Dataset: Leukemia and gene expression
Problem
This is a binary classification problem. On the basis of historical data, models (of varying degrees of complexity) should be developed to predict the type of leukemia. The best models should be explained using XAI tools at the instance level and at the data set level.
Data
Source: Molecular Classification of Cancer by Gene Expression Monitoring. Gene expression dataset (Golub et al.) https://www.kaggle.com/crawford/gene-expression#data_set_ALL_AML_independent.csv The original authors used the data to classify the type of cancer in each patient by gene expressions.
Note
Due to number of features, this dataset will be more interesting for people that have some experience in med/bio applications.