autokeras icon indicating copy to clipboard operation
autokeras copied to clipboard

ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

Open ghost opened this issue 4 years ago • 12 comments

Bug Description

Similar to this: https://github.com/keras-team/autokeras/issues/554 Training stops in AutoModel for image binary classification. Traceback (most recent call last):

File "", line 30, in auto_model.fit(x_train, y_train, epochs=epochs)

File "C:\Anaconda3\envs\automl\lib\site-packages\autokeras\auto_model.py", line 258, in fit **kwargs)

File "C:\Anaconda3\envs\automl\lib\site-packages\autokeras\engine\tuner.py", line 114, in search super().search(callbacks=new_callbacks, **fit_kwargs)

File "C:\Anaconda3\envs\automl\lib\site-packages\kerastuner\engine\base_tuner.py", line 120, in search trial = self.oracle.create_trial(self.tuner_id)

File "C:\Anaconda3\envs\automl\lib\site-packages\kerastuner\engine\oracle.py", line 149, in create_trial response = self._populate_space(trial_id)

File "C:\Anaconda3\envs\automl\lib\site-packages\kerastuner\tuners\bayesian.py", line 103, in _populate_space self.gpr.fit(x, y)

File "C:\Anaconda3\envs\automl\lib\site-packages\sklearn\gaussian_process_gpr.py", line 191, in fit ensure_2d=True, dtype="numeric")

File "C:\Anaconda3\envs\automl\lib\site-packages\sklearn\utils\validation.py", line 758, in check_X_y dtype=None)

File "C:\Anaconda3\envs\automl\lib\site-packages\sklearn\utils\validation.py", line 578, in check_array allow_nan=force_all_finite == 'allow-nan')

File "C:\Anaconda3\envs\automl\lib\site-packages\sklearn\utils\validation.py", line 60, in _assert_all_finite msg_dtype if msg_dtype is not None else X.dtype)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Bug Reproduction

Code for reproducing the bug: input_node = ak.ImageInput() output_node = ImageBlock()(input_node) output_node = ak.ClassificationHead() auto_model = ak.AutoModel(tuner='bayesian', inputs=input_node, outputs=output_node, max_trials=50, directory=out_dir, overwrite=True) auto_model.fit(x_train, y_train, epochs=200)

Data used by the code: custom image dataset x_train.shape = (998, 100, 100, 3)

Expected Behavior

Training to continue until max_trials reached.

Setup Details

Include the details about the versions of:

  • OS type and version: Win 10 Pro 64-bit
  • Python: 3.7.6
  • autokeras: 1.0.2
  • keras-tuner: 1.0.1
  • scikit-learn: 0.22.1
  • numpy: 1.18.1
  • pandas: 1.0.1
  • tensorflow-gpu: 2.1.0

Additional context

None.

ghost avatar Apr 09 '20 12:04 ghost

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 15 '20 05:06 stale[bot]

i have same problem while the input is ok (the tuner already make several trials for the same input data)

MagdyIbrahim87 avatar Jun 23 '20 14:06 MagdyIbrahim87

We have fixed this problem for the new release. Please try autokeras 1.0.3. Thanks.

haifeng-jin avatar Jun 24 '20 03:06 haifeng-jin

I still receive that Error with autokeras 1.0.3, with tuner='bayesian'. As with @MagdyIbrahim87 it occurs after few trials

naitslup avatar Jul 10 '20 11:07 naitslup

I have the same issue. It happens when the score of the trial is NaN in my case.

We need to repoen the issue.

A workaround would be to remove the trial that has a NaN score.

q-55555 avatar Sep 16 '20 13:09 q-55555

I also suspect this issue still exist due to some recent user's feedback. Anyone can provide a colab notebook for reproducing the issue?

haifeng-jin avatar Sep 26 '20 05:09 haifeng-jin

@haifeng-jin, it is difficult to provide a colab because it depends on the training parameters choosen by keras-tuner and autokeras.

All I know is that the exception "ValueError" is catched in kerastuner/tuners/bayesians.py in _populate_space function.

Below an example of a trial JSON that causes the error (you can see that some values are NaN) :

{"trial_id": "5e9e66472f6f827a8a4b5b8286adb365", "hyperparameters": {"space": [{"class_name": "Boolean", "config": {"name": "structured_data_block_1/normalize", "default": false, "conditions": []}}, {"class_name": "Choice", "config": {"name": "structured_data_block_1/dense_block_1/num_layers", "default": 2, "conditions": [], "values": [1, 2, 3], "ordered": true}}, {"class_name": "Boolean", "config": {"name": "structured_data_block_1/dense_block_1/use_batchnorm", "default": false, "conditions": []}}, {"class_name": "Choice", "config": {"name": "structured_data_block_1/dense_block_1/dropout", "default": 0, "conditions": [], "values": [0.0, 0.25, 0.5], "ordered": true}}, {"class_name": "Choice", "config": {"name": "structured_data_block_1/dense_block_1/units_0", "default": 32, "conditions": [], "values": [16, 32, 64, 128, 256, 512, 1024], "ordered": true}}, {"class_name": "Choice", "config": {"name": "structured_data_block_1/dense_block_1/units_1", "default": 32, "conditions": [], "values": [16, 32, 64, 128, 256, 512, 1024], "ordered": true}}, {"class_name": "Choice", "config": {"name": "regression_head_1/dropout", "default": 0, "conditions": [], "values": [0.0, 0.25, 0.5], "ordered": true}}, {"class_name": "Choice", "config": {"name": "optimizer", "default": "adam", "conditions": [], "values": ["adam", "sgd", "adam_weight_decay"], "ordered": false}}, {"class_name": "Choice", "config": {"name": "learning_rate", "default": 0.001, "conditions": [], "values": [0.1, 0.01, 0.001, 0.0001, 2e-05, 1e-05], "ordered": true}}, {"class_name": "Choice", "config": {"name": "structured_data_block_1/dense_block_1/units_2", "default": 32, "conditions": [], "values": [16, 32, 64, 128, 256, 512, 1024], "ordered": true}}], "values": {"structured_data_block_1/normalize": false, "structured_data_block_1/dense_block_1/num_layers": 3, "structured_data_block_1/dense_block_1/use_batchnorm": false, "structured_data_block_1/dense_block_1/dropout": 0.5, "structured_data_block_1/dense_block_1/units_0": 256, "structured_data_block_1/dense_block_1/units_1": 64, "regression_head_1/dropout": 0.25, "optimizer": "sgd", "learning_rate": 0.1, "structured_data_block_1/dense_block_1/units_2": 64}}, "metrics": {"metrics": {"loss": {"direction": "min", "observations": [{"value": [NaN], "step": 0}]}, "mean_squared_error": {"direction": "min", "observations": [{"value": [NaN], "step": 0}]}, "val_loss": {"direction": "min", "observations": [{"value": [NaN], "step": 0}]}, "val_mean_squared_error": {"direction": "min", "observations": [{"value": [NaN], "step": 0}]}}}, "score": NaN, "best_step": null, "status": "COMPLETED"}

q-55555 avatar Oct 21 '20 08:10 q-55555

I think I understand the bug. It is caused by the NaN value in validation loss which is the metric for evaluation. We will need to handle it in Keras Tuner.

haifeng-jin avatar Oct 23 '20 15:10 haifeng-jin

@haifeng-jin I am experiencing the same issue when the val_loss is NaN. I'm using BayesianOptimization as a tuner.

aelgharabawy avatar Feb 16 '21 18:02 aelgharabawy

I have the same issue. It happens when the score of the trial is NaN in my case.

We need to repoen the issue.

A workaround would be to remove the trial that has a NaN score.

@q-55555 Can you let me know how to do that? (removing a trial for which the val_loss is NaN)

aelgharabawy avatar Feb 16 '21 18:02 aelgharabawy

I have same issue for latest AutoKeras release using AutoModel for image regresison, with 'bayesian' tuner. Update: I think I may have found the culprit for this error. It's the metrics_tracking.py script in the Keras-Tuner package: kerastuner/engine/metrics_tracking.py.

I have modified the get_best_value(self) function in Line 86:

def get_best_value(self):
    values = list(
        obs.mean() for obs in self._observations.values())
    if not values:
        return None
    if self.direction == 'min':
        if np.isnan(np.nanmin(values)):
            return 999
        else:
            return np.nanmin(values)
    return np.nanmax(values)

I think when the metric scores (loss or custom metric) are all NAN, this causes np.nanmin(values) to return NAN, which then leads to the ValueError. Instead, we want np.nanmin(values) to return a number, in this case 999. Doing this avoids the dreaded ValueError exception and AutoKeras happily continues:

15/15 - 24s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 2/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 3/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 4/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 5/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 6/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 7/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 8/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 9/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 10/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan /home/joshua/PycharmProjects/pythonProject2/venv/lib/python3.8/site-packages/kerastuner/engine/metrics_tracking.py:92: RuntimeWarning: All-NaN axis encountered if np.isnan(np.nanmin(values)): Trial 8 Complete [00h 00m 35s] val_loss: 999 Best val_loss So Far: 305.5285949707031 Total elapsed time: 00h 19m 06s

This fix prevents AutoKeras from crashing due to ValueError. I am still testing it further, hopefully this helps!

koh-joshua avatar Feb 25 '21 20:02 koh-joshua

https://github.com/keras-team/keras-tuner/issues/477#issuecomment-786920437 The TerminateonNan callback may be a temporary soluiton.

haifeng-jin avatar Mar 01 '21 20:03 haifeng-jin