scikit-learn-intelex icon indicating copy to clipboard operation
scikit-learn-intelex copied to clipboard

Implement Jupyter Notebook example featuring ML performance of optimizaed Scikit-learn

Open napetrov opened this issue 5 years ago • 0 comments
trafficstars

The first part of the task is to find a real-life big data problem or competition on Kaggle which solution relies on the functionality available in daal4py-optimized Scikit-learn. I.e. the solution should run at least several minutes and spend more than 70% of time in the following algorithms, in one of them or in the combination: • Linear or ridge regression • LASSO or elastic net regularization • Logistic regression • Principal component analysis (PCA) • K-Means clustering • Pairwise distance (cosine or correlation) • C-support vector classification (SVC) The second part of the task is to reproduce the solution using Intel-optimized Scikit-learn and check the correctness of the new solution. It means that the accuracy of intel-optimized solution should not degrade comparing to the original solution. Contribute your example to repository.

  1. The data analytics or machine learning task that satisfies the requires found; the solution to the task is reproduced and gives satisfactory accuracy results with both vanilla Scikit-learn and Intel-optimized Scikit-learn. Outcome: Jupyter notebook.
  2. The original solution got one of the following improvements: a. The solution shows significant improvement in the trained model’s accuracy using Intel-optimized Scikit-learn comparing to vanilla Scikit-learn. But the time spent on the improved model training is no longer than the time spent on the original model training with vanilla Scikit-learn. b. The solution shows 1.5X speedup in the part of model training using Intel-optimized Scikit-learn comparing to vanilla Scikit-learn.

napetrov avatar Oct 05 '20 17:10 napetrov