InterpretableMachineLearning2020 icon indicating copy to clipboard operation
InterpretableMachineLearning2020 copied to clipboard

Dataset: Leukemia and gene expression

Open pbiecek opened this issue 5 years ago • 0 comments

Problem

This is a binary classification problem. On the basis of historical data, models (of varying degrees of complexity) should be developed to predict the type of leukemia. The best models should be explained using XAI tools at the instance level and at the data set level.

Data

Source: Molecular Classification of Cancer by Gene Expression Monitoring. Gene expression dataset (Golub et al.) https://www.kaggle.com/crawford/gene-expression#data_set_ALL_AML_independent.csv The original authors used the data to classify the type of cancer in each patient by gene expressions.

Note

Due to number of features, this dataset will be more interesting for people that have some experience in med/bio applications.

pbiecek avatar Feb 24 '20 17:02 pbiecek