skops icon indicating copy to clipboard operation
skops copied to clipboard

Allow users to use scikit-learn-intelex on the API inference backend

Open adrinjalali opened this issue 2 years ago • 11 comments

This requirement comes from the contract between Hugging Face and Intel.

We should have a way to allow users to utilize scikit-learn-intelex at inference time. This requires a few steps:

  • Check if models trained with scikit-learn can be served with scikit-learn-intelex and vice versa.
  • Expose a method in skops.hub_utils to add a configuration flag to use scikit-learn-intelex
    • Should probably also add a tag, so that we can have better visibility on models using this option.
  • On the https://github.com/huggingface/api-inference-community/tree/main/docker_images/sklearn side, if the flag exists, install the scikit-learn-intelex package in the environment and run the script with this command: python -m sklearnex my_application.py
    • We should check if packages on conda-forge are up-to-date: https://anaconda.org/conda-forge/scikit-learn-intelex

Project homepage: https://intel.github.io/scikit-learn-intelex/

adrinjalali avatar Dec 13 '22 17:12 adrinjalali

  • Check if models trained with scikit-learn can be served with scikit-learn-intelex and vice versa.

I started with this point and found it's possible to load sklearn (w/o intelex) models with intelex and vice versa. Here are some quick test results:

I created two scripts, make-sklearn.py and make-intelex.py, that will fit and save either a pure sklearn model, or an sklearn+intelex model, respectively. The scripts load-sklearn.py and load-intelex.py load a model (either fitted with or w/o intelex) and call predict_proba on it, with the latter script patching sklearn with intelex. All directions seem to work (ignore the 200000 in the call, which is just sample size):

$ python make-sklearn.py 200000
Fit time:	14.487415

$ python make-intelex.py 200000
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Fit time:	14.715771


$ python load-sklearn.py 200000 sklearn.pickle
Predict time:	2.506398

$ python load-sklearn.py 200000 intelex.pickle
Predict time:	2.360727


$ python load-intelex.py 200000 sklearn.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.501361

$ python load-intelex.py 200000 intelex.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.408000

Interestingly, I couldn't see any speed advantage of using intelex (even with higher sample size), but that might just be this particular model or this particular hardware (Intel® Xeon(R) CPU E3-1231 v3 @ 3.40GHz × 8).

Click to show scripts
# make-sklearn.py

import pickle
import sys
import timeit

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import StandardScaler, PolynomialFeatures

def main(n_samples):
    X, y = make_classification(n_samples=n_samples, random_state=0)

    model = Pipeline([
        ('features', FeatureUnion([
            ('scale', StandardScaler()),
            ('poly', PolynomialFeatures()),
        ])),
        ('clf', LogisticRegression()),
    ])

    out = timeit.timeit("model.fit(X, y)", number=5, globals=locals())
    print(f"Fit time:\t{out:.6f}")

    with open('sklearn.pickle', 'wb') as f:
        pickle.dump(model, f)

if __name__ == '__main__':
    main(int(sys.argv[1]))


# make-intelex.py

import pickle
import sys
import timeit

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearnex import patch_sklearn

patch_sklearn()

def main(n_samples):
    X, y = make_classification(n_samples=n_samples, random_state=0)

    model = Pipeline([
        ('features', FeatureUnion([
            ('scale', StandardScaler()),
            ('poly', PolynomialFeatures()),
        ])),
        ('clf', LogisticRegression()),
    ])

    out = timeit.timeit("model.fit(X, y)", number=5, globals=locals())
    print(f"Fit time:\t{out:.6f}")

    with open('sklearn.pickle', 'wb') as f:
        pickle.dump(model, f)

if __name__ == '__main__':
    main(int(sys.argv[1]))


# load-sklearn.py

import pickle
import sys
import timeit

from sklearn.datasets import make_classification

def main(n_samples, fname):
    X, y = make_classification(n_samples=n_samples, random_state=0)

    with open(fname, 'rb') as f:
        model = pickle.load(f)

    out = timeit.timeit("model.predict_proba(X)", number=5, globals=locals())
    print(f"Predict time:\t{out:.6f}")

if __name__ == '__main__':
    main(int(sys.argv[1]), sys.argv[2])


# load-intelex.py

import pickle
import sys
import timeit

from sklearn.datasets import make_classification
from sklearnex import patch_sklearn

patch_sklearn()

def main(n_samples, fname):
    X, y = make_classification(n_samples=n_samples, random_state=0)

    with open(fname, 'rb') as f:
        model = pickle.load(f)

    out = timeit.timeit("model.predict_proba(X)", number=5, globals=locals())
    print(f"Predict time:\t{out:.6f}")

if __name__ == '__main__':
    main(int(sys.argv[1]), sys.argv[2])

BenjaminBossan avatar Jan 16 '23 16:01 BenjaminBossan

yeah the speedups are only in a few models and some specific cases, not always.

adrinjalali avatar Jan 19 '23 16:01 adrinjalali

yeah the speedups are only in a few models and some specific cases, not always.

According to this article, logistic regression should be faster though. I didn't study the benchmark in detail, so not sure what differs here, but in the end it doesn't really matter I guess.

BenjaminBossan avatar Jan 20 '23 10:01 BenjaminBossan

During the call with Intel a next step we talked about was to have an example where having the sklearn-intelex would help on the inference side.

@napetrov here's an example of how we write examples for our docs: https://github.com/skops-dev/skops/blob/main/examples/plot_model_card.py, and it gets rendered in this page. It'd be nice for your folks to open a PR here with an example, and we're happy to review it.

adrinjalali avatar Jan 23 '23 17:01 adrinjalali

@napetrov since we're not sure how the patching is working on the intelex side, here's a question:

what happens if a user trains a model with sklearn, saves the model with pickle for instance, and in a new process (aka the hub's backend) runs the patch from intelex, then loads the model. Would they ever end up using intelex?

adrinjalali avatar Jan 24 '23 13:01 adrinjalali

  • Check if models trained with scikit-learn can be served with scikit-learn-intelex and vice versa.

I started with this point and found it's possible to load sklearn (w/o intelex) models with intelex and vice versa. Here are some quick test results:

I created two scripts, make-sklearn.py and make-intelex.py, that will fit and save either a pure sklearn model, or an sklearn+intelex model, respectively. The scripts load-sklearn.py and load-intelex.py load a model (either fitted with or w/o intelex) and call predict_proba on it, with the latter script patching sklearn with intelex. All directions seem to work (ignore the 200000 in the call, which is just sample size):

$ python make-sklearn.py 200000
Fit time:	14.487415

$ python make-intelex.py 200000
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Fit time:	14.715771


$ python load-sklearn.py 200000 sklearn.pickle
Predict time:	2.506398

$ python load-sklearn.py 200000 intelex.pickle
Predict time:	2.360727


$ python load-intelex.py 200000 sklearn.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.501361

$ python load-intelex.py 200000 intelex.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.408000

Interestingly, I couldn't see any speed advantage of using intelex (even with higher sample size), but that might just be this particular model or this particular hardware (Intel® Xeon(R) CPU E3-1231 v3 @ 3.40GHz × 8).

Click to show scripts

patch_sklearn() function should be called prior to sklearn exports to make it work. So no acceleration observed because stock scikit-learn is used.

napetrov avatar Jan 24 '23 15:01 napetrov

patch_sklearn() function should be called prior to sklearn exports to make it work

Ah, you mean prior to sklearn imports, right? Thanks, good catch. I changed the scripts to call patch_sklearn() first, like so:

import pickle
import sys
import timeit

from sklearnex import patch_sklearn
patch_sklearn()

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
...

However, the results were still pretty much the same as before:

$ python make-sklearn.py 200000
Fit time:	14.736895

$ python make-intelex.py 200000
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Fit time:	15.247209


$ python load-sklearn.py 200000 sklearn.pickle
Predict time:	2.538356

$ python load-sklearn.py 200000 intelex.pickle
Predict time:	2.392418


$ python load-intelex.py 200000 sklearn.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.561795

$ python load-intelex.py 200000 intelex.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time:	2.453521

@napetrov So I assume that the patching first-part is only relevant during training, right? When loading the model, if it was trained with the patch, import order would not matter?

Also, is there some way we can verify that a loaded estimator correctly uses intelex under the hood?

BenjaminBossan avatar Jan 24 '23 15:01 BenjaminBossan

Sorry for delayed response. We have not been looking on models that much so was looking to see what actually happens.

First - to see if extension have been used you can enable verbose mode by setting variable SKLEARNEX_VERBOSE=INFO

@BenjaminBossan - yes, currently this would impact only training as inference would be defined by model class. And interesting that results do not change between stock and intelex, need to look on script.

In terms what would happen for stock models - the answer is nothing currently, as models would have different types

>>> type(stock)
<class 'sklearn.decomposition._pca.PCA'>
>>> type(intel)
<class 'daal4py.sklearn.decomposition._pca.PCA'>

So regardless of enabling intelex on top of stock model nothing would happen

>>> res = stock.transform(X)
>>> res = intel.transform(X)
SKLEARNEX INFO: sklearn.decomposition.PCA.transform: running accelerated version on CPU

But for simple models such as PCA, Linear models and several others there is no much difference between models - object class and scikit version. So it should be possible for us to pickup stock version as well - not sure however yet how this should look like from user perspective.

So for now usage would be limited to inferencing intelex models only, but we can extend this.

Models

   >>> s
b'\x80\x04\x95\x94\x03\x00\x00\x00\x00\x00\x00\x8c\x1asklearn.decomposition._pca\x94\x8c\x03PCA\x94\x93\x94)\x81\x94}\x94(\x8c\x0cn_components\x94K\x03\x8c\x04copy\x94\x88\x8c\x06whiten\x94\x89\x8c\nsvd_solver\x94\x8c\x04auto\x94\x8c\x03tol\x94G\x00\x00\x00\x00\x00\x00\x00\x00\x8c\x0eiterated_power\x94h\t\x8c\rn_oversamples\x94K\n\x8c\x1apower_iteration_normalizer\x94h\t\x8c\x0crandom_state\x94N\x8c\x0en_features_in_\x94K\x04\x8c\x0f_fit_svd_solver\x94\x8c\x04full\x94\x8c\x05mean_\x94\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x04\x85\x94h\x16\x8c\x05dtype\x94\x93\x94\x8c\x02f8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x89C a,\xf9\xc5\x92_\x17@D\x19\xbd-ku\x08@\xb0\xf1\xd2Mb\x10\x0e@\x9a\xb5:&x0\xf3?\x94t\x94b\x8c\x0fnoise_variance_\x94h\x13\x8c\x06scalar\x94\x93\x94h"C\x08\xfe\xb7E\x03:h\x98?\x94\x86\x94R\x94\x8c\nn_samples_\x94K\x96\x8c\x0bn_features_\x94K\x04\x8c\x0bcomponents_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03K\x04\x86\x94h"\x89C`}\x96;:\xf5 \xd7?\x88\xda\xaeyD\xa3\xb5\xbf*\xf8\x7fy\xd8i\xeb?\x17\xaa\x11\xd05\xee\xd6?F!st\xc6\x02\xe5?\xd3wf\x83{]\xe7? D]N\x131\xc6\xbf\x90\xa4\x03`\xb9R\xb3\xbf*\xf8\x14\x11\xfd\x9f\xe2\xbf\xd2\x87\xa6\xe4\x15"\xe3?t)m\x1c5\x84\xb3?X\xf6\xb4zsw\xe1?\x94t\x94b\x8c\rn_components_\x94K\x03\x8c\x13explained_variance_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18\r\x03\x9c1\xb8\xe9\x10@\xc0P\x06\xc7\xd5\x0f\xcf?\xc6\xbc\xeb\xac\x89\x05\xb4?\x94t\x94b\x8c\x19explained_variance_ratio_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18Tv-\x01z\x96\xed?)\xca\x00\xb3\x87+\xab?\xde\x894\xb7X\x83\x91?\x94t\x94b\x8c\x10singular_values_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18\xc8\x12\xee\x01\x97\x199@\x11-\xe4\x81v\r\x18@-\x8c\x82\xcb7O\x0b@\x94t\x94b\x8c\x10_sklearn_version\x94\x8c\x051.1.0\x94ub.'
>>> i
b'\x80\x04\x95\x81\x03\x00\x00\x00\x00\x00\x00\x8c"daal4py.sklearn.decomposition._pca\x94\x8c\x03PCA\x94\x93\x94)\x81\x94}\x94(\x8c\x0cn_components\x94K\x03\x8c\x04copy\x94\x88\x8c\x06whiten\x94\x89\x8c\nsvd_solver\x94\x8c\x04auto\x94\x8c\x03tol\x94G\x00\x00\x00\x00\x00\x00\x00\x00\x8c\x0eiterated_power\x94h\t\x8c\rn_oversamples\x94K\n\x8c\x1apower_iteration_normalizer\x94h\t\x8c\x0crandom_state\x94N\x8c\x0en_features_in_\x94K\x04\x8c\x0f_fit_svd_solver\x94\x8c\x04full\x94\x8c\x05mean_\x94\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x04\x85\x94h\x16\x8c\x05dtype\x94\x93\x94\x8c\x02f8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x89C b,\xf9\xc5\x92_\x17@E\x19\xbd-ku\x08@\xad\xf1\xd2Mb\x10\x0e@\x98\xb5:&x0\xf3?\x94t\x94b\x8c\x0fnoise_variance_\x94h\x13\x8c\x06scalar\x94\x93\x94h"C\x08\xc7\xb7E\x03:h\x98?\x94\x86\x94R\x94\x8c\nn_samples_\x94K\x96\x8c\x0bn_features_\x94K\x04\x8c\x0bcomponents_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03K\x04\x86\x94h"\x89C`^\x96;:\xf5 \xd7?\x94\xdb\xaeyD\xa3\xb5\xbf+\xf8\x7fy\xd8i\xeb?%\xaa\x11\xd05\xee\xd6?\x96!st\xc6\x02\xe5?\x89wf\x83{]\xe7?vD]N\x131\xc6\xbfG\xa3\x03`\xb9R\xb3\xbf\xdb\xf8\x14\x11\xfd\x9f\xe2\xbf\xed\x88\xa6\xe4\x15"\xe3?\x074m\x1c5\x84\xb3?7\xf4\xb4zsw\xe1?\x94t\x94b\x8c\rn_components_\x94K\x03\x8c\x13explained_variance_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18\xf8\x02\x9c1\xb8\xe9\x10@\xa8N\x06\xc7\xd5\x0f\xcf?k\xbf\xeb\xac\x89\x05\xb4?\x94t\x94b\x8c\x19explained_variance_ratio_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18[v-\x01z\x96\xed?|\xc8\x00\xb3\x87+\xab?H\x8c4\xb7X\x83\x91?\x94t\x94b\x8c\x10singular_values_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18\xb9\x12\xee\x01\x97\x199@A,\xe4\x81v\r\x18@\xfb\x8d\x82\xcb7O\x0b@\x94t\x94bub.'


napetrov avatar Jan 26 '23 20:01 napetrov

@napetrov thank you for clarifying

First - to see if extension have been used you can enable verbose mode by setting variable SKLEARNEX_VERBOSE=INFO

@adrinjalali Do we want to enable this in the sklearn inference docker images by default? It should not hurt when intelex is not being used, and if it is, it may help us with debugging in the future.

So for now usage would be limited to inferencing intelex models only, but we can extend this.

Okay, got it. For users, it means that if they want to benefit from intelex, they need to train their models with it. That's not a big ask, so I think it's good as is.

BenjaminBossan avatar Jan 27 '23 11:01 BenjaminBossan

Okay, got it. For users, it means that if they want to benefit from intelex, they need to train their models with it. That's not a big ask, so I think it's good as is.

Yes, but i think we can get things better especially if models are mostly identical. Would be looking on this.

napetrov avatar Jan 27 '23 11:01 napetrov

@adrinjalali Do we want to enable this in the sklearn inference docker images by default? It should not hurt when intelex is not being used, and if it is, it may help us with debugging in the future.

For debugging purposes that makes sense @BenjaminBossan , but I don't think users care enough or that we have the tools to show users the information. We also shouldn't be warning users for this since the outputs they're getting is correct either way. So I'd say we're good as things are on the backend side.

adrinjalali avatar Jan 30 '23 10:01 adrinjalali