FLAML ValueError: unsupported pickle protocol: 5 while ensemble=True

Hi,

I am trying to use ensemble method while training a dataset as below-

The training was fine but in the end of the following line- [flaml.automl: 08-12 12:34:01] {3407} WARNING - Using passthrough=False for ensemble because the data contain categorical features.

I am getting below -

And then the following error occurred-

Without ensemble, I was able to complete the training.

My dataset info is as below-

Please help me to resolve this issue.

Aug 12 '22 12:08 prateekgml

The error message is https://user-images.githubusercontent.com/97145738/184356110-52f5f468-2d53-4e7d-bfea-c1943e18dc86.png It suggests that you don't have enough RAM to build the ensemble. You can try specifying a simple final_estimator, e.g.,

automl.fit(
    X_train, y_train, task="classification",
    "ensemble": {
        "final_estimator": LogisticRegression(),
        "passthrough": False,
    },
)

Aug 12 '22 21:08 sonichi

@sonichi is this syntax is correct-

And what should I import to resolve this error with above syntax-

Aug 13 '22 03:08 prateekgml

I am able to proceed after importing as from sklearn.linear_model import LogisticRegression but since my problem is a regression I should use LinearRegression() as final estimator.

Aug 13 '22 04:08 prateekgml

Now training is completed but RAM error is occurred as below-

It seems ensembling is not possible with FLAML.

Aug 13 '22 09:08 prateekgml

Now training is completed but RAM error is occurred as below-

It seems ensembling is not possible with FLAML.

How large is the dataset and how large is the free RAM? Ensemble requires more RAM than training each single model. One thing you can try is to use a smaller time budget for tuning such that small models will be used for ensembling.

Aug 13 '22 22:08 sonichi

Now training is completed but RAM error is occurred as below- It seems ensembling is not possible with FLAML.

How large is the dataset and how large is the free RAM? Ensemble requires more RAM than training each single model. One thing you can try is to use a smaller time budget for tuning such that small models will be used for ensembling.

Dataset is not so big as you can see here-

With "time_budget\": 10000*10, I am able to continue the training in Kaggle. Since its execution time crossed 12 hours limit, it stopped further.

I will continue my experiment with Colab Pro+ which has Tesla P100 16 GB GPU and 54 GiB RAM. I will inform you once the investigation is done.

Aug 15 '22 11:08 prateekgml

Now training is completed but RAM error is occurred as below- It seems ensembling is not possible with FLAML.

How large is the dataset and how large is the free RAM? Ensemble requires more RAM than training each single model. One thing you can try is to use a smaller time budget for tuning such that small models will be used for ensembling.

Dataset is not so big as you can see here-

With "time_budget\": 10000*10, I am able to continue the training in Kaggle. Since its execution time crossed 12 hours limit, it stopped further.

I will continue my experiment with Colab Pro+ which has Tesla P100 16 GB GPU and 54 GiB RAM. I will inform you once the investigation is done.

Thanks. Have you tried using smaller time budget to test whether the ensemble works? One other question is did you feed the raw data into AutoML or did you do one-hot encoding before it? One-hot encoding could blow up the size of the dataset and should be avoided.

Aug 15 '22 13:08 sonichi

Yes, I started the experiment with 3600 but then ensembling was covering 96% so I increased the time budget. And I am feeding raw data into the FLAML.

Aug 15 '22 14:08 prateekgml

Yes, I started the experiment with 3600 but then ensembling was covering 96% so I increased the time budget. And I am feeding raw data into the FLAML.

Do you mean ensemble works 96% cases or using 96% RAM when using 3600?

Aug 15 '22 15:08 sonichi

I mean with a lower time budget, search was 96% and in log message it was saying to increase the time budget to complete the search as 100%.

Aug 15 '22 15:08 prateekgml

I mean with a lower time budget, search was 96% and in log message it was saying to increase the time budget to complete the search as 100%.

Did ensemble succeed with lower time budget?

Aug 16 '22 00:08 sonichi

Yes, training was completed with below message-

It seems all estimators' hyperparameter search was not converged and thus score of my prediction model is poorer than a single best model using FLAML. With ensemble score is 83.4 and with single best model it was 83.55.

Aug 16 '22 02:08 prateekgml

Please suggest some steps to improve the result using ensembling. My current settings are as below-

Aug 16 '22 05:08 prateekgml

If you have enough RAM now, try removing the key "final_estimator" from "ensemble".

Aug 16 '22 10:08 sonichi

I have started the training with below changes in colab pro+ which have 54 GiB RAM-

can you please tell is there any setting to save whole logs in a file. Currently it is not saving all logs.

Aug 16 '22 11:08 prateekgml

Use log_type="all". https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#log-the-trials

Aug 16 '22 11:08 sonichi

I did an experiment with commenting on the ensemble line like below-

And the score dropped to 83.3.

Aug 16 '22 11:08 prateekgml

@sonichi as per your suggestion with the below changes, training is completed in 3467.75 seconds-

And the score is 83.54 which is less than without an ensemble (83.55).

Can you please suggest any other steps to improve the ensembling?

Aug 16 '22 13:08 prateekgml

@sonichi I tried adding more estimators with the ensemble as below-

And my score improved to 83.968 from 83.55. Is there anything I can try to improve the score?

Aug 17 '22 10:08 prateekgml

@sonichi I tried adding more estimators with the ensemble as below-

And my score improved to 83.968 from 83.55. Is there anything I can try to improve the score?

What about using all the estimators, by removing the estimator_list argument?

Aug 18 '22 20:08 sonichi

After removing the estimator_list, the score dropped to 83.92 from 83.96.

Aug 19 '22 06:08 prateekgml