sklearn-porter Changed prediction to run with multithreading

Changed prediction to run with multithreading

Open skjerns opened this issue 5 years ago • 6 comments

I saw that the predict or integrity_score is running quite slow.

I've added functionality to let it run with threading, making it much faster. It adds a dependency on joblib, however this is already a dependency of sklearn, so no new dependencies are really added. This makes the code ~8x faster (with 8 threads).
I've changed the call from Shell.check_output to subprocess.check_output. Shell is calling subprocess.check_output in the background anyway, but like this we get another speedup of ~3-4x

so a total speedup of ~30x is possible.

Example:

import numpy as np
import sklearn_porter
from sklearn.ensemble import RandomForestClassifier

train_x = np.random.rand(1000, 8)
train_y = np.random.randint(0, 4, 1000)

rfc = RandomForestClassifier(n_estimators=10)
rfc.fit(train_x, train_y)
        
porter = sklearn_porter.Porter(rfc, language='c')
porter.integrity_score(train_x) # ~30 times faster.

I've also seen that integrity_score runs perfectly fine on Windows, given that gcc is installed (and the hard-coded blocking of windows is removed). Do you think we can remove the blocking of the function for windows platforms?

May 15 '19 12:05 skjerns

Hello @skjerns ,

this is great! I will merge your PR and adapt it to the new major release. Until it's done I will keep this PR open.

Best, Darius

Jun 25 '19 11:06 nok

Meanwhile I have found another solution that speeds up things to almost real-time predictions:

I altered the int main(){..} such that it accepts several data points as input, not just one. This way, I can verify several hundred inputs in one call. I'll make another PR proposing this soon if you want. However, it's a bit deeper alteration of the code and needs to be done for each language individually, so might not be preferable.

example for C:

int main(int argc, const char * argv[]) {
    if ((argc-1) % n_features != 0){
            printf("Need to supply N x %d features flattened, %d were given", n_features, argc-1);
            return 1;
        }
    double features[n_features];
    int n_rows = (argc-1) / n_features;
    for (int row=0; row < n_rows; row++){
        printf("row: %d\\n", row);
        for (int i = 0; i < n_features; i++) {
            features[i] = atof(argv[i+row*n_features+1]);
        }
        // calculate outputs for debugging
        int class_idx = predict_class_idx(features);
        // same as calling label = predict(features)
        int label = labels[class_idx];
        
        // now we print the results
        printf("labels: ");
        for (int i=0; i<n_classes; i++){        
            printf("%d ", labels[i]);
        }
        printf("\\n");
        printf("class_idx: %d\\n", class_idx);
        printf("label: %d", label);
        printf("\\n\\n");
    }
    return 0;}

Jun 25 '19 12:06 skjerns

In the next release all internal predictions will be multiprocessed by default. Here is the relevant part: https://github.com/nok/sklearn-porter/blob/release/1.0.0/sklearn_porter/Estimator.py#L652-L682

I altered the int main(){..} such that it accepts several data points as input, not just one. This way, I can verify several hundred inputs in one call. I'll make another PR proposing this soon if you want. However, it's a bit deeper alteration of the code and needs to be done for each language individually, so might not be preferable.

Yes, SIMD operations would be nice. But for now I prefer a simple and intuitive starting point where a developer can change and extend the generated source code easily. Nevertheless I see and understand the need, so I would suggest that we create an additional interactive example (something like that) where we demonstrate the customization and the final benefit. The current scaffold of a template is here.

What do you think?

Dec 19 '19 00:12 nok

I've also seen that integrity_score runs perfectly fine on Windows, given that gcc is installed (and the hard-coded blocking of windows is removed). Do you think we can remove the blocking of the function for windows platforms?

Thanks for the note! That sounds great. I removed all checks that are related to the operating system: https://github.com/nok/sklearn-porter/blob/release/1.0.0/sklearn_porter/Estimator.py#L701

Dec 19 '19 00:12 nok

I've also seen that integrity_score runs perfectly fine on Windows, given that gcc is installed (and the hard-coded blocking of windows is removed). Do you think we can remove the blocking of the function for windows platforms?

Thanks for the note! That sounds great. I removed all checks that are related to the operating system: https://github.com/nok/sklearn-porter/blob/release/1.0.0/sklearn_porter/Estimator.py#L701

great! might be handy to include a gcc_installed() function with printed warnings etc.

edit: Ah, I guess that's done by DEPENDENCIES

Dec 21 '19 10:12 skjerns

In the next release all internal predictions will be multiprocessed by default. Here is the relevant part: https://github.com/nok/sklearn-porter/blob/release/1.0.0/sklearn_porter/Estimator.py#L652-L682

Great! Nice.

I altered the int main(){..} such that it accepts several data points as input, not just one. This way, I can verify several hundred inputs in one call. I'll make another PR proposing this soon if you want. However, it's a bit deeper alteration of the code and needs to be done for each language individually, so might not be preferable.

Yes, SIMD operations would be nice. But for now I prefer a simple and intuitive starting point where a developer can change and extend the generated source code easily. Nevertheless I see and understand the need, so I would suggest that we create an additional interactive example (something like that) where we demonstrate the customization and the final benefit. The current scaffold of a template is here.

What do you think?

I'll leave it up to you. Having the source code of individual language templates would be feasible I guess?

Dec 21 '19 10:12 skjerns

sklearn-porter sklearn-porter copied to clipboard

Changed prediction to run with multithreading

sklearn-porter
sklearn-porter copied to clipboard