hummingbird icon indicating copy to clipboard operation
hummingbird copied to clipboard

fixup scikit-learn PolynomialFeatures

Open ksaur opened this issue 3 years ago • 12 comments

In #269, the basics are implemented, but to be fully complete we need: (1) degree larger than 2 (2) support for interaction_only (3) more tests

ksaur avatar Aug 30 '20 18:08 ksaur

@ksaur working on the same.

Hemantr05 avatar Sep 10 '20 22:09 Hemantr05

Welcome! Please reach out with questions as necessary! :)

ksaur avatar Sep 10 '20 22:09 ksaur

@ksaur will do ksaur

Hemantr05 avatar Sep 11 '20 10:09 Hemantr05

@ksaur

from sklearn.preprocessing import PolynomialFeatures from hummingbird.ml import convert

X = np.arange(6).reshape(1,6) y = np.random.randint(2, size=6)

poly = PolynomialFeatures(1) poly_x = poly.fit_transform(X) poly.fit(X, y)

poly_convert = convert(poly, 'pytorch')

This is the error reproduced after implementing the above code for 1 dimension.

Unable to find converter for model type <class 'numpy.ndarray'>. It usually means the pipeline being converted contains a transformer or a predictor with no corresponding converter implemented. Please fill an issue at https://github.com/microsoft/hummingbird.

Is the error right? If so, i'll fix it. otherwise, could you please guide me through the error to produced

Hemantr05 avatar Sep 13 '20 20:09 Hemantr05

If you see this error, it's possible you don't have the latest hummingbird (0.0.6) installed, or that you are feeding it the wrong data type

For "(1)" above: The error I get when I run your code (and the error I expect to see) is :

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-7-84b752131820> in <module>
      9 poly.fit(X, y)
     10 
---> 11 poly_convert = convert(poly, 'pytorch')
     12 

~/hummingbird/hummingbird/ml/convert.py in convert(model, backend, test_input, device, extra_config)
    250         return _convert_onnxml(model, backend, test_input, device, extra_config)
    251 
--> 252     return _convert_sklearn(model, backend, test_input, device, extra_config)

~/hummingbird/hummingbird/ml/convert.py in _convert_sklearn(model, backend, test_input, device, extra_config)
     78 
     79     # Convert the Topology object into a PyTorch model.
---> 80     hb_model = topology_converter(topology, backend, device, extra_config=extra_config)
     81     return hb_model
     82 

~/hummingbird/hummingbird/ml/_topology.py in convert(topology, backend, device, extra_config)
     74             )
     75         except Exception as e:
---> 76             raise e
     77 
     78     operators = list(topology.topological_operator_iterator())

~/hummingbird/hummingbird/ml/_topology.py in convert(topology, backend, device, extra_config)
     66                 extra_config[constants.TREE_IMPLEMENTATION] = "tree_trav"
     67 
---> 68             operator_map[operator.full_name] = converter(operator, device, extra_config)
     69         except ValueError:
     70             raise MissingConverter(

~/hummingbird/hummingbird/ml/operator_converters/sklearn/poly_features.py in convert_sklearn_poly_features(operator, device, extra_config)
     75 
     76     if operator.raw_operator.degree != 2:
---> 77         raise NotImplementedError("Hummingbird currently only supports degree 2 for PolynomialFeatures")
     78     return PolynomialFeatures(
     79         operator.raw_operator.n_input_features_,

NotImplementedError: Hummingbird currently only supports degree 2 for PolynomialFeatures

Please let me know if I misunderstood your question! Hopefully that makes sense!

ksaur avatar Sep 14 '20 02:09 ksaur

@ksaur, I have the updated version of hummingbird, was able to reproduce the error you got. As you mentioned, the input was incorrect.

Hemantr05 avatar Sep 14 '20 23:09 Hemantr05

Hi @Hemantr05 , just checking in to see if you have any questions? :)

ksaur avatar Oct 02 '20 17:10 ksaur

Hi @ksaur , none as of now. Will be creating a PR in a couple of days.

Hemantr05 avatar Oct 16 '20 11:10 Hemantr05

@ksaur Apologies for the delay. Will resolve this by the end of the week

Hemantr05 avatar Nov 10 '20 22:11 Hemantr05

Just came over this library. Great initiative. However I second that there is a bug in the polynomial transformer:

This code runs, but does not evaluate the test at the bottom to true, it looks like something is up with the polynomial features.

from hummingbird.ml import convert
from sklearn import pipeline, preprocessing , linear_model
# Create some random data for binary classification
num_classes = 2
N = 1000
X = np.random.rand(N, 28)
y = np.random.randint(num_classes, size=N)

# Create and train a model (scikit-learn RandomForestClassifier in this case)


skl_model = pipeline.make_pipeline(
  preprocessing.StandardScaler(),
  preprocessing.PolynomialFeatures(),
  linear_model.LinearRegression()
)

skl_model.fit(X, y)

y_pred_skl = skl_model.predict(X) 

#print(y_pred_skl)

# Use Hummingbird to convert the model to PyTorch
model = convert(skl_model, 'pytorch')

# Run predictions on CPU
print(np.allclose(model.predict(X), y_pred_skl))
print(model.predict(X) - skl_model.predict(X) )```

ereide avatar Nov 11 '20 08:11 ereide

@ereide working on fixing the same.

Hemantr05 avatar Nov 11 '20 14:11 Hemantr05

Hi @ereide Welcome!

Thanks for reporting this! PolynomialFeatures is only partially implemented, so yes it is very likely that there is a bug. Thanks for this example! We will use it in our testing.

ksaur avatar Nov 11 '20 16:11 ksaur