tpot icon indicating copy to clipboard operation
tpot copied to clipboard

Add the ability to pickle TPOTRegressor object

Open RafeyIqbalRahman opened this issue 4 years ago • 32 comments

The following text is more of a feature request than a bug report.

I have been using Flask to create a web app before trying TPOT. Flask requires a pickle file of the model to predict and display the result on the web app. However, since the TPOT object is not pickle-able, the web app isn't functional anymore.

It would be great if the ability to pickle the TPOTRegressor object is added.

RafeyIqbalRahman avatar Oct 10 '20 18:10 RafeyIqbalRahman

I guess you can just pickle the final pipeline object (tpot.fitted_pipeline_), which is a normal sklearn pipeline and can be used independently from tpot.

hanshupe avatar Oct 11 '20 12:10 hanshupe

I guess you can just pickle the final pipeline object (tpot.fitted_pipeline_), which is a normal sklearn pipeline and can be used independently from tpot.

I tried but doing that returned an AttributeError.

RafeyIqbalRahman avatar Oct 11 '20 15:10 RafeyIqbalRahman

Where did you get the error? I tried it and it worked.

hanshupe avatar Oct 11 '20 15:10 hanshupe

Where did you get the error? I tried it and it worked.

Can you show the code that works?

RafeyIqbalRahman avatar Oct 11 '20 16:10 RafeyIqbalRahman

After fitting your tpot object you just call

pipeline_dump = pickle.dumps(tpot.fitted_pipeline_) pipeline = pickle.loads(pipeline_dump) print(pipeline)

hanshupe avatar Oct 11 '20 16:10 hanshupe

Thanks for the code. It worked. But when I loaded the model on Flask, I'm getting a strange error.

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature

The web app I am creating requires 2 parameters longitude and latitude to show prediction.

Here is the Colab notebook link: https://colab.research.google.com/drive/178OrUuWEigZC-y2A2O83AVDGZdwxIsoW?usp=sharing

RafeyIqbalRahman avatar Oct 11 '20 18:10 RafeyIqbalRahman

How do you call the pipeline when you do predictions and what is the shape and type of your input data.

you can do predictions by executing pipeline.predict(data) where data can be a data frame or matrix with the same number of columns as you used during training.

(Your link can't be accessed)

hanshupe avatar Oct 11 '20 18:10 hanshupe

I called the pipeline using pickle.load. The shape of the data is 18x3. The datatype is float. The link is accessible now. I'll try pipeline.predict to make predictions.

RafeyIqbalRahman avatar Oct 11 '20 18:10 RafeyIqbalRahman

you called .predict(y) but you have to call .predict(X).

hanshupe avatar Oct 11 '20 18:10 hanshupe

You are right. Actually, I was checking what is the reason for the error. Turns out that when I did .predict(y), I got the same error as I mentioned above. However, in the Flask code, I'm feeding X and still getting the same error as I got in .predict(y).

RafeyIqbalRahman avatar Oct 11 '20 18:10 RafeyIqbalRahman

Like said above, the shape of your input data must match exactly the shape during training. You pass a list, so it's just 1 column with two rows. You need to do something like np.column_stack((a, b)), but there are different ways.

hanshupe avatar Oct 11 '20 19:10 hanshupe

Can you show a practical example of reshaping the data? Or can you please do the same in the notebook's link so I can get an idea?

RafeyIqbalRahman avatar Oct 11 '20 19:10 RafeyIqbalRahman

np.column_stack([list-variable])

hanshupe avatar Oct 11 '20 19:10 hanshupe

This doesn't work. Also, I'm getting an error message saying that the features should be 2D while the target should be 1D.

RafeyIqbalRahman avatar Oct 11 '20 19:10 RafeyIqbalRahman

just print the type and shape of your input. There are many threads on stackoverflow how you can convert it into 2D.

hanshupe avatar Oct 11 '20 19:10 hanshupe

When converted into 2D, this is what I got.

image

RafeyIqbalRahman avatar Oct 11 '20 19:10 RafeyIqbalRahman

Also, as seen here, TPOT lacks multi-label regression ability.

RafeyIqbalRahman avatar Oct 11 '20 19:10 RafeyIqbalRahman

Do you have multiple variables at output? If not then it must work. Just check the shapes of your input X and your target y.

hanshupe avatar Oct 11 '20 20:10 hanshupe

No, I have a single variable at the output. The shape of X is (18, 2) and that of y is (18, 1).

RafeyIqbalRahman avatar Oct 11 '20 20:10 RafeyIqbalRahman

But then it's not multi label.

hanshupe avatar Oct 11 '20 20:10 hanshupe

Btw. I recommend that you secure your notebook again to not get security issues.

hanshupe avatar Oct 11 '20 20:10 hanshupe

But then it's not multi label.

So what's the solution now?

RafeyIqbalRahman avatar Oct 12 '20 17:10 RafeyIqbalRahman

As explained, pickling is possible with tpot, also you don't have a multi label problem here, you just have to shape your data correctly.

hanshupe avatar Oct 12 '20 17:10 hanshupe

@hanshupe thank you for answer the question herein. @RafeyIqbalRahman I think the y's shape should (18, ) instead of (18, 1) if y is a 1-D array.

weixuanfu avatar Oct 14 '20 17:10 weixuanfu

@hanshupe thank you for answer the question herein. @RafeyIqbalRahman I think the y's shape should (18, ) instead of (18, 1) if y is a 1-D array.

I reshaped y using .flatten but still the Flask app is not showing the predicted result.

RafeyIqbalRahman avatar Oct 16 '20 06:10 RafeyIqbalRahman

@RafeyIqbalRahman I cannot check your Colab notebook (maybe permission issue?) and am not sure why Flask app is not working. Could you please provide a demo for reproducing the issue of pickling tpot.fitted_pipeline_?

weixuanfu avatar Oct 16 '20 13:10 weixuanfu

@weixuanfu this is the link to my Colab notebook: https://colab.research.google.com/drive/1jVRhIZEV8rjdsQFPvWJof9R7wJcdF-cd?usp=sharing. I feel there's some issue with the pickle file that's why the Flask app is not working.

RafeyIqbalRahman avatar Oct 16 '20 15:10 RafeyIqbalRahman

I have a quick look. I think final_features only has 1 feature but the model was fitted with 2 features (X.shape=(18,2)), which make the prediction did not work. You can add a line like print(final_features.shape) before prediction = model.predict(final_features) to check that.

weixuanfu avatar Oct 16 '20 16:10 weixuanfu

Thanks. Since final_features is a list object, I used len(final_features) to get the length and the length turned out to be 1 and the model is fitted with 2 features. How to solve this?

RafeyIqbalRahman avatar Oct 21 '20 17:10 RafeyIqbalRahman

When I tried to reshape final_features, I got a ValueError saying that an array of size 1 cannot be reshaped into shape (1,2).

RafeyIqbalRahman avatar Oct 22 '20 07:10 RafeyIqbalRahman