machine-learning-articles icon indicating copy to clipboard operation
machine-learning-articles copied to clipboard

Stacking Ensemble Machine Learning With Python

Open khuyentran1401 opened this issue 4 years ago • 0 comments

TL;DR

Stacking or Stacked Generalization is an ensemble machine learning algorithm, using a meta-learning algorithm to learn how to best combine the predictions from two or more base machine learning algorithms

Article Link

https://machinelearningmastery.com/stacking-ensemble-machine-learning-with-python/

Author

Jason Brownlee

Key Takeaways

  • Stacking combines well-performing models on a classification or regression task and make predictions that have better performance than any single model in the ensemble.
  • Compare between different machine learning models and choose the best model

Useful Code Snippets

# compare standalone models for binary classification
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from matplotlib import pyplot
 
# get the dataset
def get_dataset():
	X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
	return X, y
 
# get a list of models to evaluate
def get_models():
	models = dict()
	models['lr'] = LogisticRegression()
	models['knn'] = KNeighborsClassifier()
	models['cart'] = DecisionTreeClassifier()
	models['svm'] = SVC()
	models['bayes'] = GaussianNB()
	return models
 
# evaluate a given model using cross-validation
def evaluate_model(model):
	cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
	scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')
	return scores
 
# define dataset
X, y = get_dataset()
# get the models to evaluate
models = get_models()
# evaluate the models and store results
results, names = list(), list()
for name, model in models.items():
	scores = evaluate_model(model)
	results.append(scores)
	names.append(name)
	print('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))
# plot model performance for comparison
pyplot.boxplot(results, labels=names, showmeans=True)
pyplot.show()

Useful Tools

Comments/ Questions

khuyentran1401 avatar Apr 12 '20 01:04 khuyentran1401