FLAML icon indicating copy to clipboard operation
FLAML copied to clipboard

ValueError: unsupported pickle protocol: 5 while ensemble=True

Open prateekgml opened this issue 3 years ago • 21 comments

Hi,

I am trying to use ensemble method while training a dataset as below-

image

The training was fine but in the end of the following line- [flaml.automl: 08-12 12:34:01] {3407} WARNING - Using passthrough=False for ensemble because the data contain categorical features.

I am getting below -

image

And then the following error occurred-

image

image

image

Without ensemble, I was able to complete the training.

My dataset info is as below-

image

image

Please help me to resolve this issue.

prateekgml avatar Aug 12 '22 12:08 prateekgml

The error message is https://user-images.githubusercontent.com/97145738/184356110-52f5f468-2d53-4e7d-bfea-c1943e18dc86.png It suggests that you don't have enough RAM to build the ensemble. You can try specifying a simple final_estimator, e.g.,

automl.fit(
    X_train, y_train, task="classification",
    "ensemble": {
        "final_estimator": LogisticRegression(),
        "passthrough": False,
    },
)

sonichi avatar Aug 12 '22 21:08 sonichi

@sonichi is this syntax is correct-

image

And what should I import to resolve this error with above syntax-

image

prateekgml avatar Aug 13 '22 03:08 prateekgml

I am able to proceed after importing as from sklearn.linear_model import LogisticRegression but since my problem is a regression I should use LinearRegression() as final estimator.

prateekgml avatar Aug 13 '22 04:08 prateekgml

Now training is completed but RAM error is occurred as below-

image

It seems ensembling is not possible with FLAML.

prateekgml avatar Aug 13 '22 09:08 prateekgml

Now training is completed but RAM error is occurred as below-

image

It seems ensembling is not possible with FLAML.

How large is the dataset and how large is the free RAM? Ensemble requires more RAM than training each single model. One thing you can try is to use a smaller time budget for tuning such that small models will be used for ensembling.

sonichi avatar Aug 13 '22 22:08 sonichi

Now training is completed but RAM error is occurred as below- image It seems ensembling is not possible with FLAML.

How large is the dataset and how large is the free RAM? Ensemble requires more RAM than training each single model. One thing you can try is to use a smaller time budget for tuning such that small models will be used for ensembling.

Dataset is not so big as you can see here-

image

With "time_budget\": 10000*10, I am able to continue the training in Kaggle. Since its execution time crossed 12 hours limit, it stopped further.

I will continue my experiment with Colab Pro+ which has Tesla P100 16 GB GPU and 54 GiB RAM. I will inform you once the investigation is done.

prateekgml avatar Aug 15 '22 11:08 prateekgml

Now training is completed but RAM error is occurred as below- image It seems ensembling is not possible with FLAML.

How large is the dataset and how large is the free RAM? Ensemble requires more RAM than training each single model. One thing you can try is to use a smaller time budget for tuning such that small models will be used for ensembling.

Dataset is not so big as you can see here-

image

With "time_budget\": 10000*10, I am able to continue the training in Kaggle. Since its execution time crossed 12 hours limit, it stopped further.

I will continue my experiment with Colab Pro+ which has Tesla P100 16 GB GPU and 54 GiB RAM. I will inform you once the investigation is done.

Thanks. Have you tried using smaller time budget to test whether the ensemble works? One other question is did you feed the raw data into AutoML or did you do one-hot encoding before it? One-hot encoding could blow up the size of the dataset and should be avoided.

sonichi avatar Aug 15 '22 13:08 sonichi

Yes, I started the experiment with 3600 but then ensembling was covering 96% so I increased the time budget. And I am feeding raw data into the FLAML.

prateekgml avatar Aug 15 '22 14:08 prateekgml

Yes, I started the experiment with 3600 but then ensembling was covering 96% so I increased the time budget. And I am feeding raw data into the FLAML.

Do you mean ensemble works 96% cases or using 96% RAM when using 3600?

sonichi avatar Aug 15 '22 15:08 sonichi

I mean with a lower time budget, search was 96% and in log message it was saying to increase the time budget to complete the search as 100%.

prateekgml avatar Aug 15 '22 15:08 prateekgml

I mean with a lower time budget, search was 96% and in log message it was saying to increase the time budget to complete the search as 100%.

Did ensemble succeed with lower time budget?

sonichi avatar Aug 16 '22 00:08 sonichi

Yes, training was completed with below message-

image

It seems all estimators' hyperparameter search was not converged and thus score of my prediction model is poorer than a single best model using FLAML. With ensemble score is 83.4 and with single best model it was 83.55.

prateekgml avatar Aug 16 '22 02:08 prateekgml

Please suggest some steps to improve the result using ensembling. My current settings are as below-

image

prateekgml avatar Aug 16 '22 05:08 prateekgml

If you have enough RAM now, try removing the key "final_estimator" from "ensemble".

sonichi avatar Aug 16 '22 10:08 sonichi

I have started the training with below changes in colab pro+ which have 54 GiB RAM-

image

can you please tell is there any setting to save whole logs in a file. Currently it is not saving all logs.

prateekgml avatar Aug 16 '22 11:08 prateekgml

Use log_type="all". https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#log-the-trials

sonichi avatar Aug 16 '22 11:08 sonichi

I did an experiment with commenting on the ensemble line like below-

image

And the score dropped to 83.3.

prateekgml avatar Aug 16 '22 11:08 prateekgml

@sonichi as per your suggestion with the below changes, training is completed in 3467.75 seconds-

image

image

And the score is 83.54 which is less than without an ensemble (83.55).

Can you please suggest any other steps to improve the ensembling?

prateekgml avatar Aug 16 '22 13:08 prateekgml

@sonichi I tried adding more estimators with the ensemble as below-

image

And my score improved to 83.968 from 83.55. Is there anything I can try to improve the score?

prateekgml avatar Aug 17 '22 10:08 prateekgml

@sonichi I tried adding more estimators with the ensemble as below-

image

And my score improved to 83.968 from 83.55. Is there anything I can try to improve the score?

What about using all the estimators, by removing the estimator_list argument?

sonichi avatar Aug 18 '22 20:08 sonichi

After removing the estimator_list, the score dropped to 83.92 from 83.96.

prateekgml avatar Aug 19 '22 06:08 prateekgml