hummingbird
hummingbird copied to clipboard
fixup scikit-learn PolynomialFeatures
In #269, the basics are implemented, but to be fully complete we need: (1) degree larger than 2 (2) support for interaction_only (3) more tests
@ksaur working on the same.
Welcome! Please reach out with questions as necessary! :)
@ksaur will do ksaur
@ksaur
from sklearn.preprocessing import PolynomialFeatures
from hummingbird.ml import convert
X = np.arange(6).reshape(1,6)
y = np.random.randint(2, size=6)
poly = PolynomialFeatures(1)
poly_x = poly.fit_transform(X)
poly.fit(X, y)
poly_convert = convert(poly, 'pytorch')
This is the error reproduced after implementing the above code for 1 dimension.
Unable to find converter for model type <class 'numpy.ndarray'>. It usually means the pipeline being converted contains a transformer or a predictor with no corresponding converter implemented. Please fill an issue at https://github.com/microsoft/hummingbird.
Is the error right? If so, i'll fix it. otherwise, could you please guide me through the error to produced
If you see this error, it's possible you don't have the latest hummingbird (0.0.6
) installed, or that you are feeding it the wrong data type
For "(1)" above: The error I get when I run your code (and the error I expect to see) is :
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-7-84b752131820> in <module>
9 poly.fit(X, y)
10
---> 11 poly_convert = convert(poly, 'pytorch')
12
~/hummingbird/hummingbird/ml/convert.py in convert(model, backend, test_input, device, extra_config)
250 return _convert_onnxml(model, backend, test_input, device, extra_config)
251
--> 252 return _convert_sklearn(model, backend, test_input, device, extra_config)
~/hummingbird/hummingbird/ml/convert.py in _convert_sklearn(model, backend, test_input, device, extra_config)
78
79 # Convert the Topology object into a PyTorch model.
---> 80 hb_model = topology_converter(topology, backend, device, extra_config=extra_config)
81 return hb_model
82
~/hummingbird/hummingbird/ml/_topology.py in convert(topology, backend, device, extra_config)
74 )
75 except Exception as e:
---> 76 raise e
77
78 operators = list(topology.topological_operator_iterator())
~/hummingbird/hummingbird/ml/_topology.py in convert(topology, backend, device, extra_config)
66 extra_config[constants.TREE_IMPLEMENTATION] = "tree_trav"
67
---> 68 operator_map[operator.full_name] = converter(operator, device, extra_config)
69 except ValueError:
70 raise MissingConverter(
~/hummingbird/hummingbird/ml/operator_converters/sklearn/poly_features.py in convert_sklearn_poly_features(operator, device, extra_config)
75
76 if operator.raw_operator.degree != 2:
---> 77 raise NotImplementedError("Hummingbird currently only supports degree 2 for PolynomialFeatures")
78 return PolynomialFeatures(
79 operator.raw_operator.n_input_features_,
NotImplementedError: Hummingbird currently only supports degree 2 for PolynomialFeatures
Please let me know if I misunderstood your question! Hopefully that makes sense!
@ksaur, I have the updated version of hummingbird, was able to reproduce the error you got. As you mentioned, the input was incorrect.
Hi @Hemantr05 , just checking in to see if you have any questions? :)
Hi @ksaur , none as of now. Will be creating a PR in a couple of days.
@ksaur Apologies for the delay. Will resolve this by the end of the week
Just came over this library. Great initiative. However I second that there is a bug in the polynomial transformer:
This code runs, but does not evaluate the test at the bottom to true, it looks like something is up with the polynomial features.
from hummingbird.ml import convert
from sklearn import pipeline, preprocessing , linear_model
# Create some random data for binary classification
num_classes = 2
N = 1000
X = np.random.rand(N, 28)
y = np.random.randint(num_classes, size=N)
# Create and train a model (scikit-learn RandomForestClassifier in this case)
skl_model = pipeline.make_pipeline(
preprocessing.StandardScaler(),
preprocessing.PolynomialFeatures(),
linear_model.LinearRegression()
)
skl_model.fit(X, y)
y_pred_skl = skl_model.predict(X)
#print(y_pred_skl)
# Use Hummingbird to convert the model to PyTorch
model = convert(skl_model, 'pytorch')
# Run predictions on CPU
print(np.allclose(model.predict(X), y_pred_skl))
print(model.predict(X) - skl_model.predict(X) )```
@ereide working on fixing the same.
Hi @ereide Welcome!
Thanks for reporting this! PolynomialFeatures is only partially implemented, so yes it is very likely that there is a bug. Thanks for this example! We will use it in our testing.