sklearn-porter
sklearn-porter copied to clipboard
Changed prediction to run with multithreading
I saw that the predict
or integrity_score
is running quite slow.
-
I've added functionality to let it run with
threading
, making it much faster. It adds a dependency onjoblib
, however this is already a dependency ofsklearn
, so no new dependencies are really added. This makes the code ~8x faster (with 8 threads). -
I've changed the call from
Shell.check_output
tosubprocess.check_output
.Shell
is callingsubprocess.check_output
in the background anyway, but like this we get another speedup of ~3-4x
so a total speedup of ~30x is possible.
Example:
import numpy as np
import sklearn_porter
from sklearn.ensemble import RandomForestClassifier
train_x = np.random.rand(1000, 8)
train_y = np.random.randint(0, 4, 1000)
rfc = RandomForestClassifier(n_estimators=10)
rfc.fit(train_x, train_y)
porter = sklearn_porter.Porter(rfc, language='c')
porter.integrity_score(train_x) # ~30 times faster.
I've also seen that integrity_score
runs perfectly fine on Windows, given that gcc
is installed (and the hard-coded blocking of windows is removed). Do you think we can remove the blocking of the function for windows platforms?
Hello @skjerns ,
this is great! I will merge your PR and adapt it to the new major release. Until it's done I will keep this PR open.
Best, Darius
Meanwhile I have found another solution that speeds up things to almost real-time predictions:
I altered the int main(){..}
such that it accepts several data points as input, not just one. This way, I can verify several hundred inputs in one call. I'll make another PR proposing this soon if you want. However, it's a bit deeper alteration of the code and needs to be done for each language individually, so might not be preferable.
example for C:
int main(int argc, const char * argv[]) {
if ((argc-1) % n_features != 0){
printf("Need to supply N x %d features flattened, %d were given", n_features, argc-1);
return 1;
}
double features[n_features];
int n_rows = (argc-1) / n_features;
for (int row=0; row < n_rows; row++){
printf("row: %d\\n", row);
for (int i = 0; i < n_features; i++) {
features[i] = atof(argv[i+row*n_features+1]);
}
// calculate outputs for debugging
int class_idx = predict_class_idx(features);
// same as calling label = predict(features)
int label = labels[class_idx];
// now we print the results
printf("labels: ");
for (int i=0; i<n_classes; i++){
printf("%d ", labels[i]);
}
printf("\\n");
printf("class_idx: %d\\n", class_idx);
printf("label: %d", label);
printf("\\n\\n");
}
return 0;}
In the next release all internal predictions will be multiprocessed by default. Here is the relevant part: https://github.com/nok/sklearn-porter/blob/release/1.0.0/sklearn_porter/Estimator.py#L652-L682
I altered the int main(){..} such that it accepts several data points as input, not just one. This way, I can verify several hundred inputs in one call. I'll make another PR proposing this soon if you want. However, it's a bit deeper alteration of the code and needs to be done for each language individually, so might not be preferable.
Yes, SIMD operations would be nice. But for now I prefer a simple and intuitive starting point where a developer can change and extend the generated source code easily. Nevertheless I see and understand the need, so I would suggest that we create an additional interactive example (something like that) where we demonstrate the customization and the final benefit. The current scaffold of a template is here.
What do you think?
I've also seen that integrity_score runs perfectly fine on Windows, given that gcc is installed (and the hard-coded blocking of windows is removed). Do you think we can remove the blocking of the function for windows platforms?
Thanks for the note! That sounds great. I removed all checks that are related to the operating system: https://github.com/nok/sklearn-porter/blob/release/1.0.0/sklearn_porter/Estimator.py#L701
I've also seen that integrity_score runs perfectly fine on Windows, given that gcc is installed (and the hard-coded blocking of windows is removed). Do you think we can remove the blocking of the function for windows platforms?
Thanks for the note! That sounds great. I removed all checks that are related to the operating system: https://github.com/nok/sklearn-porter/blob/release/1.0.0/sklearn_porter/Estimator.py#L701
great! might be handy to include a gcc_installed()
function with printed warnings etc.
edit: Ah, I guess that's done by DEPENDENCIES
In the next release all internal predictions will be multiprocessed by default. Here is the relevant part: https://github.com/nok/sklearn-porter/blob/release/1.0.0/sklearn_porter/Estimator.py#L652-L682
Great! Nice.
I altered the int main(){..} such that it accepts several data points as input, not just one. This way, I can verify several hundred inputs in one call. I'll make another PR proposing this soon if you want. However, it's a bit deeper alteration of the code and needs to be done for each language individually, so might not be preferable.
Yes, SIMD operations would be nice. But for now I prefer a simple and intuitive starting point where a developer can change and extend the generated source code easily. Nevertheless I see and understand the need, so I would suggest that we create an additional interactive example (something like that) where we demonstrate the customization and the final benefit. The current scaffold of a template is here.
What do you think?
I'll leave it up to you. Having the source code of individual language templates would be feasible I guess?