autokeras icon indicating copy to clipboard operation
autokeras copied to clipboard

max_model_size is not useful with ImageClassifier

Open eschibli opened this issue 3 years ago • 10 comments

Bug Description

When max_model_size is set below 63812570, ImageClassifier will repeatedly try to build a model with an EfficientNet-B7 backbone, then throw a RuntimeError. EfficientNet-B7 is a very large model - too large to manage with most GPUs - so it would certainly be nice to be able to easily define a smaller maximum number of model weights.

I know it is possible to work around this issue by defining a custom search space that excludes the largest backbones, but it does not appear to be simple to exclude all of the large ones while keeping, say, the smaller versions of EfficientNet.

Bug Reproduction

Code:

Search for a good model for the
[MNIST](https://keras.io/datasets/#mnist-database-of-handwritten-digits) dataset.
"""

from tensorflow.keras.datasets import mnist
import autokeras as ak

# Prepare the dataset.
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train.shape)  # (60000, 28, 28)
print(y_train.shape)  # (60000,)
print(y_train[:3])  # array([7, 2, 1], dtype=uint8)

# Initialize the ImageClassifier.
clf = ak.ImageClassifier(max_trials=10, max_model_size=25000000, overwrite=True)
# Search for the best model.
clf.fit(x_train, y_train, epochs=5)
# Evaluate on the testing data.
print('Accuracy: {accuracy}'.format(
    accuracy=clf.evaluate(x_test, y_test)))

Output:

Trial 2 Complete [00h 22m 27s]
val_loss: 0.1945810168981552

Best val_loss So Far: 0.04196520149707794
Total elapsed time: 00h 23m 06s

Search: Running Trial #3

Hyperparameter    |Value             |Best Value So Far 
image_block_1/b...|efficient         |vanilla           
image_block_1/n...|True              |True              
image_block_1/a...|True              |False             
image_block_1/i...|True              |None              
image_block_1/i...|False             |None              
image_block_1/i...|0                 |None              
image_block_1/i...|0                 |None              
image_block_1/i...|0.1               |None              
image_block_1/i...|0                 |None              
image_block_1/e...|True              |None              
image_block_1/e...|b7                |None              
image_block_1/e...|True              |None              
image_block_1/e...|True              |None              
classification_...|global_avg        |flatten           
classification_...|0                 |0.5               
optimizer         |adam              |adam              
learning_rate     |2e-05             |0.001             

Downloading data from https://storage.googleapis.com/keras-applications/efficientnetb7_notop.h5
258080768/258076736 [==============================] - 2s 0us/step
Oversized model: 63812570 parameters -- skipping
Oversized model: 63812570 parameters -- skipping
Oversized model: 63812570 parameters -- skipping
Oversized model: 63812570 parameters -- skipping
Oversized model: 63812570 parameters -- skipping
Oversized model: 63812570 parameters -- skipping

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-b04aed537333> in <module>
     36 clf = ak.ImageClassifier(max_trials=10, max_model_size=25000000, overwrite=True)
     37 # Search for the best model.
---> 38 clf.fit(x_train, y_train, epochs=5)
     39 # Evaluate on the testing data.
     40 print('Accuracy: {accuracy}'.format(

/opt/conda/lib/python3.7/site-packages/autokeras/tasks/image.py in fit(self, x, y, epochs, callbacks, validation_split, validation_data, **kwargs)
    164             validation_split=validation_split,
    165             validation_data=validation_data,
--> 166             **kwargs
    167         )
    168 

/opt/conda/lib/python3.7/site-packages/autokeras/auto_model.py in fit(self, x, y, batch_size, epochs, callbacks, validation_split, validation_data, **kwargs)
    277             validation_data=validation_data,
    278             validation_split=validation_split,
--> 279             **kwargs
    280         )
    281 

/opt/conda/lib/python3.7/site-packages/autokeras/engine/tuner.py in search(self, epochs, callbacks, validation_split, **fit_kwargs)
    179         self.oracle.update_space(hp)
    180 
--> 181         super().search(epochs=epochs, callbacks=new_callbacks, **fit_kwargs)
    182 
    183         # Train the best model use validation data.

/opt/conda/lib/python3.7/site-packages/kerastuner/engine/base_tuner.py in search(self, *fit_args, **fit_kwargs)
    129 
    130             self.on_trial_begin(trial)
--> 131             self.run_trial(trial, *fit_args, **fit_kwargs)
    132             self.on_trial_end(trial)
    133         self.on_search_end()

/opt/conda/lib/python3.7/site-packages/kerastuner/engine/tuner.py in run_trial(self, trial, *fit_args, **fit_kwargs)
    170         copied_fit_kwargs['callbacks'] = callbacks
    171 
--> 172         self._build_and_fit_model(trial, fit_args, copied_fit_kwargs)
    173 
    174     def save_model(self, trial_id, model, step=0):

/opt/conda/lib/python3.7/site-packages/autokeras/engine/tuner.py in _build_and_fit_model(self, trial, fit_args, fit_kwargs)
     96         pipeline.save(self._pipeline_path(trial.trial_id))
     97 
---> 98         model = self.hypermodel.build(trial.hyperparameters)
     99         self.adapt(model, fit_kwargs["x"])
    100 

/opt/conda/lib/python3.7/site-packages/kerastuner/engine/hypermodel.py in _build_wrapper(self, hp, *args, **kwargs)
     62             # to the search space.
     63             hp = hp.copy()
---> 64         return self._build(hp, *args, **kwargs)
     65 
     66 

/opt/conda/lib/python3.7/site-packages/kerastuner/engine/hypermodel.py in build(self, hp)
    126                 if i == self._max_fail_streak:
    127                     raise RuntimeError(
--> 128                         'Too many consecutive oversized models.')
    129                 continue
    130             break

RuntimeError: Too many consecutive oversized models.

Expected Behavior

ImageClassifier does not repeatedly try backbones with more weights than max_model_size and explores the remaining parameter space.

Setup Details

  • Python: 3.7.6
  • autokeras: 1.0.12
  • tensorflow: 2.3.1

Additional context

I am new to automl, but not to machine learning, so please bear with me if I'm missing something obvious. I searched for max-model_size and didn't see another post about this issue.

eschibli avatar Jan 04 '21 20:01 eschibli

As a temporary solution, you may try using random search as the tuning algorithm. We will fix this issue systematically later.

haifeng-jin avatar Feb 15 '21 20:02 haifeng-jin

Thanks for the suggestion haifeng, that should help in the meantime.

eschibli avatar Feb 17 '21 04:02 eschibli

Hi @eschibli! Did you try tuner = 'random'?. I am trying something similar to your test but I have the same problem. I am further limiting the model size to only 153600 parameters in order to find small architectures for tensorflow lite for microcontrollers. Could you suggest me some way to use ImageClassifier for this? Thank you so much!

elloza avatar Mar 05 '21 19:03 elloza

@haifeng-jin @elloza

I did try, and as elloza says, it isn't working. Varying the tuner does not appear to prevent the same model from being repeatedly tried and rejected for size.

eschibli avatar Mar 15 '21 22:03 eschibli

I meet the same problem

sunzhe09 avatar Mar 16 '21 08:03 sunzhe09

I somehow realized that the default (greedy) tuner would be more suitable to solve this problem. It would go through a list of models ( I expect at least one of those would work and not over-sized), and make modifications to that model.

From the error message above, it seems there is a bug. It fails while going through the initial model list. I will inspect it.

haifeng-jin avatar Mar 18 '21 20:03 haifeng-jin

A possible fix to pull request:

  • Handle the consecutive oversize model error during the initial trials.
  • Have a way to mark a trial as invalid so that the trial would be selected as the best trials later.

haifeng-jin avatar Mar 18 '21 20:03 haifeng-jin

@haifeng-jin Without looking at the current code, it seems like simply assigning very high or Inf loss to invalid models would solve the problem rather elegently.

eschibli avatar Mar 30 '21 17:03 eschibli

any updates to that topic? I'm also interested in small models with max. 150.000 parameters.

Are the better solutions than changing the self._max_fail_streak in kerastuner/engine/hypermodel.py so far?

andife avatar Jun 09 '21 08:06 andife

This appears to be fixed in v1.1 for max_model_size>=23555082 (the size of the hardcoded Resnet model?). Anything smaller and a FailedTrialError is raised and not handled.

eschibli avatar Jun 08 '23 00:06 eschibli