skops
skops copied to clipboard
Allow users to use scikit-learn-intelex on the API inference backend
This requirement comes from the contract between Hugging Face and Intel.
We should have a way to allow users to utilize scikit-learn-intelex at inference time. This requires a few steps:
- Check if models trained with
scikit-learncan be served withscikit-learn-intelexand vice versa. - Expose a method in
skops.hub_utilsto add a configuration flag to usescikit-learn-intelex- Should probably also add a tag, so that we can have better visibility on models using this option.
- On the https://github.com/huggingface/api-inference-community/tree/main/docker_images/sklearn side, if the flag exists, install the
scikit-learn-intelexpackage in the environment and run the script with this command:python -m sklearnex my_application.py- We should check if packages on
conda-forgeare up-to-date: https://anaconda.org/conda-forge/scikit-learn-intelex
- We should check if packages on
Project homepage: https://intel.github.io/scikit-learn-intelex/
- Check if models trained with
scikit-learncan be served withscikit-learn-intelexand vice versa.
I started with this point and found it's possible to load sklearn (w/o intelex) models with intelex and vice versa. Here are some quick test results:
I created two scripts, make-sklearn.py and make-intelex.py, that will fit and save either a pure sklearn model, or an sklearn+intelex model, respectively. The scripts load-sklearn.py and load-intelex.py load a model (either fitted with or w/o intelex) and call predict_proba on it, with the latter script patching sklearn with intelex. All directions seem to work (ignore the 200000 in the call, which is just sample size):
$ python make-sklearn.py 200000
Fit time: 14.487415
$ python make-intelex.py 200000
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Fit time: 14.715771
$ python load-sklearn.py 200000 sklearn.pickle
Predict time: 2.506398
$ python load-sklearn.py 200000 intelex.pickle
Predict time: 2.360727
$ python load-intelex.py 200000 sklearn.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time: 2.501361
$ python load-intelex.py 200000 intelex.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time: 2.408000
Interestingly, I couldn't see any speed advantage of using intelex (even with higher sample size), but that might just be this particular model or this particular hardware (Intel® Xeon(R) CPU E3-1231 v3 @ 3.40GHz × 8).
Click to show scripts
# make-sklearn.py
import pickle
import sys
import timeit
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
def main(n_samples):
X, y = make_classification(n_samples=n_samples, random_state=0)
model = Pipeline([
('features', FeatureUnion([
('scale', StandardScaler()),
('poly', PolynomialFeatures()),
])),
('clf', LogisticRegression()),
])
out = timeit.timeit("model.fit(X, y)", number=5, globals=locals())
print(f"Fit time:\t{out:.6f}")
with open('sklearn.pickle', 'wb') as f:
pickle.dump(model, f)
if __name__ == '__main__':
main(int(sys.argv[1]))
# make-intelex.py
import pickle
import sys
import timeit
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearnex import patch_sklearn
patch_sklearn()
def main(n_samples):
X, y = make_classification(n_samples=n_samples, random_state=0)
model = Pipeline([
('features', FeatureUnion([
('scale', StandardScaler()),
('poly', PolynomialFeatures()),
])),
('clf', LogisticRegression()),
])
out = timeit.timeit("model.fit(X, y)", number=5, globals=locals())
print(f"Fit time:\t{out:.6f}")
with open('sklearn.pickle', 'wb') as f:
pickle.dump(model, f)
if __name__ == '__main__':
main(int(sys.argv[1]))
# load-sklearn.py
import pickle
import sys
import timeit
from sklearn.datasets import make_classification
def main(n_samples, fname):
X, y = make_classification(n_samples=n_samples, random_state=0)
with open(fname, 'rb') as f:
model = pickle.load(f)
out = timeit.timeit("model.predict_proba(X)", number=5, globals=locals())
print(f"Predict time:\t{out:.6f}")
if __name__ == '__main__':
main(int(sys.argv[1]), sys.argv[2])
# load-intelex.py
import pickle
import sys
import timeit
from sklearn.datasets import make_classification
from sklearnex import patch_sklearn
patch_sklearn()
def main(n_samples, fname):
X, y = make_classification(n_samples=n_samples, random_state=0)
with open(fname, 'rb') as f:
model = pickle.load(f)
out = timeit.timeit("model.predict_proba(X)", number=5, globals=locals())
print(f"Predict time:\t{out:.6f}")
if __name__ == '__main__':
main(int(sys.argv[1]), sys.argv[2])
yeah the speedups are only in a few models and some specific cases, not always.
yeah the speedups are only in a few models and some specific cases, not always.
According to this article, logistic regression should be faster though. I didn't study the benchmark in detail, so not sure what differs here, but in the end it doesn't really matter I guess.
During the call with Intel a next step we talked about was to have an example where having the sklearn-intelex would help on the inference side.
@napetrov here's an example of how we write examples for our docs: https://github.com/skops-dev/skops/blob/main/examples/plot_model_card.py, and it gets rendered in this page. It'd be nice for your folks to open a PR here with an example, and we're happy to review it.
@napetrov since we're not sure how the patching is working on the intelex side, here's a question:
what happens if a user trains a model with sklearn, saves the model with pickle for instance, and in a new process (aka the hub's backend) runs the patch from intelex, then loads the model. Would they ever end up using intelex?
- Check if models trained with
scikit-learncan be served withscikit-learn-intelexand vice versa.I started with this point and found it's possible to load sklearn (w/o intelex) models with intelex and vice versa. Here are some quick test results:
I created two scripts,
make-sklearn.pyandmake-intelex.py, that will fit and save either a pure sklearn model, or an sklearn+intelex model, respectively. The scriptsload-sklearn.pyandload-intelex.pyload a model (either fitted with or w/o intelex) and callpredict_probaon it, with the latter script patching sklearn with intelex. All directions seem to work (ignore the200000in the call, which is just sample size):$ python make-sklearn.py 200000 Fit time: 14.487415 $ python make-intelex.py 200000 Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex) Fit time: 14.715771 $ python load-sklearn.py 200000 sklearn.pickle Predict time: 2.506398 $ python load-sklearn.py 200000 intelex.pickle Predict time: 2.360727 $ python load-intelex.py 200000 sklearn.pickle Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex) Predict time: 2.501361 $ python load-intelex.py 200000 intelex.pickle Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex) Predict time: 2.408000Interestingly, I couldn't see any speed advantage of using intelex (even with higher sample size), but that might just be this particular model or this particular hardware (Intel® Xeon(R) CPU E3-1231 v3 @ 3.40GHz × 8).
Click to show scripts
patch_sklearn() function should be called prior to sklearn exports to make it work. So no acceleration observed because stock scikit-learn is used.
patch_sklearn() function should be called prior to sklearn exports to make it work
Ah, you mean prior to sklearn imports, right? Thanks, good catch. I changed the scripts to call patch_sklearn() first, like so:
import pickle
import sys
import timeit
from sklearnex import patch_sklearn
patch_sklearn()
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
...
However, the results were still pretty much the same as before:
$ python make-sklearn.py 200000
Fit time: 14.736895
$ python make-intelex.py 200000
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Fit time: 15.247209
$ python load-sklearn.py 200000 sklearn.pickle
Predict time: 2.538356
$ python load-sklearn.py 200000 intelex.pickle
Predict time: 2.392418
$ python load-intelex.py 200000 sklearn.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time: 2.561795
$ python load-intelex.py 200000 intelex.pickle
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Predict time: 2.453521
@napetrov So I assume that the patching first-part is only relevant during training, right? When loading the model, if it was trained with the patch, import order would not matter?
Also, is there some way we can verify that a loaded estimator correctly uses intelex under the hood?
Sorry for delayed response. We have not been looking on models that much so was looking to see what actually happens.
First - to see if extension have been used you can enable verbose mode by setting variable SKLEARNEX_VERBOSE=INFO
@BenjaminBossan - yes, currently this would impact only training as inference would be defined by model class. And interesting that results do not change between stock and intelex, need to look on script.
In terms what would happen for stock models - the answer is nothing currently, as models would have different types
>>> type(stock)
<class 'sklearn.decomposition._pca.PCA'>
>>> type(intel)
<class 'daal4py.sklearn.decomposition._pca.PCA'>
So regardless of enabling intelex on top of stock model nothing would happen
>>> res = stock.transform(X)
>>> res = intel.transform(X)
SKLEARNEX INFO: sklearn.decomposition.PCA.transform: running accelerated version on CPU
But for simple models such as PCA, Linear models and several others there is no much difference between models - object class and scikit version. So it should be possible for us to pickup stock version as well - not sure however yet how this should look like from user perspective.
So for now usage would be limited to inferencing intelex models only, but we can extend this.
Models
>>> s
b'\x80\x04\x95\x94\x03\x00\x00\x00\x00\x00\x00\x8c\x1asklearn.decomposition._pca\x94\x8c\x03PCA\x94\x93\x94)\x81\x94}\x94(\x8c\x0cn_components\x94K\x03\x8c\x04copy\x94\x88\x8c\x06whiten\x94\x89\x8c\nsvd_solver\x94\x8c\x04auto\x94\x8c\x03tol\x94G\x00\x00\x00\x00\x00\x00\x00\x00\x8c\x0eiterated_power\x94h\t\x8c\rn_oversamples\x94K\n\x8c\x1apower_iteration_normalizer\x94h\t\x8c\x0crandom_state\x94N\x8c\x0en_features_in_\x94K\x04\x8c\x0f_fit_svd_solver\x94\x8c\x04full\x94\x8c\x05mean_\x94\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x04\x85\x94h\x16\x8c\x05dtype\x94\x93\x94\x8c\x02f8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x89C a,\xf9\xc5\x92_\x17@D\x19\xbd-ku\x08@\xb0\xf1\xd2Mb\x10\x0e@\x9a\xb5:&x0\xf3?\x94t\x94b\x8c\x0fnoise_variance_\x94h\x13\x8c\x06scalar\x94\x93\x94h"C\x08\xfe\xb7E\x03:h\x98?\x94\x86\x94R\x94\x8c\nn_samples_\x94K\x96\x8c\x0bn_features_\x94K\x04\x8c\x0bcomponents_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03K\x04\x86\x94h"\x89C`}\x96;:\xf5 \xd7?\x88\xda\xaeyD\xa3\xb5\xbf*\xf8\x7fy\xd8i\xeb?\x17\xaa\x11\xd05\xee\xd6?F!st\xc6\x02\xe5?\xd3wf\x83{]\xe7? D]N\x131\xc6\xbf\x90\xa4\x03`\xb9R\xb3\xbf*\xf8\x14\x11\xfd\x9f\xe2\xbf\xd2\x87\xa6\xe4\x15"\xe3?t)m\x1c5\x84\xb3?X\xf6\xb4zsw\xe1?\x94t\x94b\x8c\rn_components_\x94K\x03\x8c\x13explained_variance_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18\r\x03\x9c1\xb8\xe9\x10@\xc0P\x06\xc7\xd5\x0f\xcf?\xc6\xbc\xeb\xac\x89\x05\xb4?\x94t\x94b\x8c\x19explained_variance_ratio_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18Tv-\x01z\x96\xed?)\xca\x00\xb3\x87+\xab?\xde\x894\xb7X\x83\x91?\x94t\x94b\x8c\x10singular_values_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18\xc8\x12\xee\x01\x97\x199@\x11-\xe4\x81v\r\x18@-\x8c\x82\xcb7O\x0b@\x94t\x94b\x8c\x10_sklearn_version\x94\x8c\x051.1.0\x94ub.'
>>> i
b'\x80\x04\x95\x81\x03\x00\x00\x00\x00\x00\x00\x8c"daal4py.sklearn.decomposition._pca\x94\x8c\x03PCA\x94\x93\x94)\x81\x94}\x94(\x8c\x0cn_components\x94K\x03\x8c\x04copy\x94\x88\x8c\x06whiten\x94\x89\x8c\nsvd_solver\x94\x8c\x04auto\x94\x8c\x03tol\x94G\x00\x00\x00\x00\x00\x00\x00\x00\x8c\x0eiterated_power\x94h\t\x8c\rn_oversamples\x94K\n\x8c\x1apower_iteration_normalizer\x94h\t\x8c\x0crandom_state\x94N\x8c\x0en_features_in_\x94K\x04\x8c\x0f_fit_svd_solver\x94\x8c\x04full\x94\x8c\x05mean_\x94\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94K\x00\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x04\x85\x94h\x16\x8c\x05dtype\x94\x93\x94\x8c\x02f8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00t\x94b\x89C b,\xf9\xc5\x92_\x17@E\x19\xbd-ku\x08@\xad\xf1\xd2Mb\x10\x0e@\x98\xb5:&x0\xf3?\x94t\x94b\x8c\x0fnoise_variance_\x94h\x13\x8c\x06scalar\x94\x93\x94h"C\x08\xc7\xb7E\x03:h\x98?\x94\x86\x94R\x94\x8c\nn_samples_\x94K\x96\x8c\x0bn_features_\x94K\x04\x8c\x0bcomponents_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03K\x04\x86\x94h"\x89C`^\x96;:\xf5 \xd7?\x94\xdb\xaeyD\xa3\xb5\xbf+\xf8\x7fy\xd8i\xeb?%\xaa\x11\xd05\xee\xd6?\x96!st\xc6\x02\xe5?\x89wf\x83{]\xe7?vD]N\x131\xc6\xbfG\xa3\x03`\xb9R\xb3\xbf\xdb\xf8\x14\x11\xfd\x9f\xe2\xbf\xed\x88\xa6\xe4\x15"\xe3?\x074m\x1c5\x84\xb3?7\xf4\xb4zsw\xe1?\x94t\x94b\x8c\rn_components_\x94K\x03\x8c\x13explained_variance_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18\xf8\x02\x9c1\xb8\xe9\x10@\xa8N\x06\xc7\xd5\x0f\xcf?k\xbf\xeb\xac\x89\x05\xb4?\x94t\x94b\x8c\x19explained_variance_ratio_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18[v-\x01z\x96\xed?|\xc8\x00\xb3\x87+\xab?H\x8c4\xb7X\x83\x91?\x94t\x94b\x8c\x10singular_values_\x94h\x15h\x18K\x00\x85\x94h\x1a\x87\x94R\x94(K\x01K\x03\x85\x94h"\x89C\x18\xb9\x12\xee\x01\x97\x199@A,\xe4\x81v\r\x18@\xfb\x8d\x82\xcb7O\x0b@\x94t\x94bub.'
@napetrov thank you for clarifying
First - to see if extension have been used you can enable verbose mode by setting variable SKLEARNEX_VERBOSE=INFO
@adrinjalali Do we want to enable this in the sklearn inference docker images by default? It should not hurt when intelex is not being used, and if it is, it may help us with debugging in the future.
So for now usage would be limited to inferencing intelex models only, but we can extend this.
Okay, got it. For users, it means that if they want to benefit from intelex, they need to train their models with it. That's not a big ask, so I think it's good as is.
Okay, got it. For users, it means that if they want to benefit from intelex, they need to train their models with it. That's not a big ask, so I think it's good as is.
Yes, but i think we can get things better especially if models are mostly identical. Would be looking on this.
@adrinjalali Do we want to enable this in the sklearn inference docker images by default? It should not hurt when intelex is not being used, and if it is, it may help us with debugging in the future.
For debugging purposes that makes sense @BenjaminBossan , but I don't think users care enough or that we have the tools to show users the information. We also shouldn't be warning users for this since the outputs they're getting is correct either way. So I'd say we're good as things are on the backend side.