FLAML icon indicating copy to clipboard operation
FLAML copied to clipboard

best_model_for_estimator() returns empty objects for some estimators

Open flippercy opened this issue 3 years ago • 14 comments

Hi @sonichi:

I just found out that after running automl with several customized estimators, for certain estimators, best_model_for_estimator() returned an empty object although a few models have been built with them. When I tried to save the best model, there were error messages like below:

AttributeError: 'NoneType' object has no attribute 'save_model'

Do you know why? My team has been using FLAML for quite a while so it is not due to coding errors or time budget. The dataset is big; however, we used similar datasets with no issue before. Our FLAML is the latest version, 0.9.5 and we set model_history = True.

Thank you.

flippercy avatar Jan 27 '22 02:01 flippercy

In addition, during the training I noticed some discrepancies in the output as shown below:

image

'MonotonicCatboost' and 'MonotonicXgboostDart' are my customized classifiers. In this case, after training a monotonic catboost, shouldn't the program return the best performance of MonotonicCatboost instead of MonotonicXgboostDart?

flippercy avatar Jan 27 '22 06:01 flippercy

In addition, during the training I noticed some discrepancies in the output as shown below:

image

'MonotonicCatboost' and 'MonotonicXgboostDart' are my customized classifiers. In this case, after training a monotonic catboost, shouldn't the program return the best performance of MonotonicCatboost instead of MonotonicXgboostDart?

What's the search space for "MonotonicCatboost"? One possible reason is that the sampler fails to find a new config for MonotonicCatboost, so it is skipped. And for some reason some lines in the console log are missing.

sonichi avatar Jan 27 '22 18:01 sonichi

Hi @sonichi:

I just found out that after running automl with several customized estimators, for certain estimators, best_model_for_estimator() returned an empty object although a few models have been built with them. When I tried to save the best model, there were error messages like below:

AttributeError: 'NoneType' object has no attribute 'save_model'

Do you know why? My team has been using FLAML for quite a while so it is not due to coding errors or time budget. The dataset is big; however, we used similar datasets with no issue before. Our FLAML is the latest version, 0.9.5 and we set model_history = True.

Thank you.

If you make the estimator_list contain a single estimator that causes this issue, do you get the same problem?

sonichi avatar Jan 27 '22 18:01 sonichi

Hi @sonichi:

There is no problem if I just run automl() with the estimator causing the issue.

Moreover, my observations include:

  1. The issue is not data-specific. A coworker ran automl() with a different and much smaller dataset and got the same issue.
  2. The issue is kind of random. Later yesterday I re-ran automl() with less CPUs and somehow it worked without any problem. Not sure whether n_jobs was the reason or it was just luck.
  3. Probably it is related with some recent updates? I personally used FLAML a lot for similar cases and never met this problem before.

Thank you.

flippercy avatar Jan 27 '22 21:01 flippercy

In addition, during the training I noticed some discrepancies in the output as shown below: image 'MonotonicCatboost' and 'MonotonicXgboostDart' are my customized classifiers. In this case, after training a monotonic catboost, shouldn't the program return the best performance of MonotonicCatboost instead of MonotonicXgboostDart?

What's the search space for "MonotonicCatboost"? One possible reason is that the sampler fails to find a new config for MonotonicCatboost, so it is skipped. And for some reason some lines in the console log are missing.

The search space for MonotonicCatboost is quite big and should not be the reason for the discrepancy; more catboost models were built later in that search and the issue did not happen anymore. Probably it is just a one-time glitch but want to let you know.

flippercy avatar Jan 27 '22 21:01 flippercy

Hi @sonichi:

There is no problem if I just run automl() with the estimator causing the issue.

Moreover, my observations include:

  1. The issue is not data-specific. A coworker ran automl() with a different and much smaller dataset and got the same issue.
  2. The issue is kind of random. Later yesterday I re-ran automl() with less CPUs and somehow it worked without any problem. Not sure whether n_jobs was the reason or it was just luck.
  3. Probably it is related with some recent updates? I personally used FLAML a lot for similar cases and never met this problem before.

Thank you.

Good that it's not data-specific. Bad that it's random. It happens to the custom estimator only, right?

sonichi avatar Jan 27 '22 23:01 sonichi

Hi @sonichi: There is no problem if I just run automl() with the estimator causing the issue. Moreover, my observations include:

  1. The issue is not data-specific. A coworker ran automl() with a different and much smaller dataset and got the same issue.
  2. The issue is kind of random. Later yesterday I re-ran automl() with less CPUs and somehow it worked without any problem. Not sure whether n_jobs was the reason or it was just luck.
  3. Probably it is related with some recent updates? I personally used FLAML a lot for similar cases and never met this problem before.

Thank you.

Good that it's not data-specific. Bad that it's random. It happens to the custom estimator only, right?

I am not sure since we usually use customized estimators only; might do some tests with default estimators later.

flippercy avatar Jan 28 '22 17:01 flippercy

I haven't seen this issue before. So there might be some unexpected behaviors of the custom estimators. Could you use log_type="all" and check the logged results, and see if there is any anomaly?

sonichi avatar Jan 28 '22 22:01 sonichi

@sonichi this issue has not happened since then so I will close the thread for now. We may discuss it later if it appears again.

Thank you very much for the help!

flippercy avatar Feb 02 '22 15:02 flippercy

I noticed this issue was closed but I just encountered the same problem with a default estimator.

[automl.best_model_for_estimator(e) for e in automl.estimator_list]

returns

[<flaml.model.LGBMEstimator at 0x260821d54c0>,
 <flaml.model.RandomForestEstimator at 0x260821f6fd0>,
 <flaml.model.CatBoostEstimator at 0x260821f66a0>,
 <flaml.model.XGBoostSklearnEstimator at 0x260821d5a00>,
 <flaml.model.ExtraTreesEstimator at 0x260821f6f10>,
 None]

where the last estimator should be xgb_limitdepth, which is in automl.estimator_list.

Since this did not happen for a colleague with same estimators and data it might in fact be somewhat random.

TimSchim avatar May 30 '22 13:05 TimSchim

And I can confirm that this issue still exists for me.

flippercy avatar May 30 '22 13:05 flippercy

Found the source of this for my case. Seems like xgb_limitdepth gets relatively few resources compared to the other estimators which leads to no model being fit for this estimator. Increasing the time_budget solved it for me.

Edit: This only reduces the chance of this problem.

TimSchim avatar May 30 '22 14:05 TimSchim

Thanks @TimSchim @flippercy @TimSchim there is no guarantee to train every estimator in the estimator list within the time budget. One way to increase the priority of a particular estimator is to redefine the cost_relative2lgbm() function for the corresponding estimator class. The lower is the cost, the higher is the priority.

sonichi avatar Jun 22 '22 04:06 sonichi

Hi @sonichi:

It's been a while and I hope you are doing well. Glad to see your package, FLAML, attracting so much attention and already got almost 2k stars.

I have to follow up on this issue again because it has never been really solved and we have no clue what caused it. During the last six months, my team have been using several versions of FLAML with various datasets on different platforms. And we still got this issue - a certain estimator was trained by FLAML but not saved in the output, occasionally. Based on total_iter returned by _search_states.items(), the estimator impacted had been trained a lot of times (50-100 usually); however, best_model_for_estimator() still returned nothing but an empty object for it.

The only fix, based on my experience, is simply to restart the session, change the random seed and rerun the search. The issue is unrelated with the OS, version of FLAML or size of data used. I am not sure whether it is due to the customized estimators used but as I said, most of the time the process ran well without any problem; the issue only happens randomly. Moreover, the log file looks total normal even when the issue happened. The setting of reticulate might be a possible cause because unlike most users here, we use FLAML in R via reticulate and it seems that very few people experienced the same issue as we did.

Not sure whether we can find a way to replicate this error on your end for troubleshooting. Currently it is a bit annoying because it made new users confused and suspicious of the process. Let me know if you have any suggestions.

Thank you.

Yu Cao

flippercy avatar Jul 21 '22 21:07 flippercy

Sorry for missing this for a long time. Does it still exist, @flippercy ?

sonichi avatar Oct 11 '22 23:10 sonichi

Hi @sonichi ! Thank you for the reply.

The issue still happens randomly and it is hard to reproduce for debugging. However, one observation is that it never happened (so far at least) when we ran FLAML in python. Therefore, we suspect that it may due to something in reticulate when we called FLAML in R.

Since our modeling process, especially the autoML component, is moving to python in AzureML, probably we do not need to worry about it right now. Let's see whether it happens again in the new environment.

Appreciate your help!

flippercy avatar Oct 12 '22 14:10 flippercy

Thank you @flippercy . In case I miss your message again in future, you can reach me and other maintainers on discord.

sonichi avatar Oct 12 '22 16:10 sonichi