autokeras
autokeras copied to clipboard
ValueError: Input contains NaN, infinity or a value too large for dtype('float64')
Bug Description
Similar to this: https://github.com/keras-team/autokeras/issues/554 Training stops in AutoModel for image binary classification. Traceback (most recent call last):
File "
File "C:\Anaconda3\envs\automl\lib\site-packages\autokeras\auto_model.py", line 258, in fit **kwargs)
File "C:\Anaconda3\envs\automl\lib\site-packages\autokeras\engine\tuner.py", line 114, in search super().search(callbacks=new_callbacks, **fit_kwargs)
File "C:\Anaconda3\envs\automl\lib\site-packages\kerastuner\engine\base_tuner.py", line 120, in search trial = self.oracle.create_trial(self.tuner_id)
File "C:\Anaconda3\envs\automl\lib\site-packages\kerastuner\engine\oracle.py", line 149, in create_trial response = self._populate_space(trial_id)
File "C:\Anaconda3\envs\automl\lib\site-packages\kerastuner\tuners\bayesian.py", line 103, in _populate_space self.gpr.fit(x, y)
File "C:\Anaconda3\envs\automl\lib\site-packages\sklearn\gaussian_process_gpr.py", line 191, in fit ensure_2d=True, dtype="numeric")
File "C:\Anaconda3\envs\automl\lib\site-packages\sklearn\utils\validation.py", line 758, in check_X_y dtype=None)
File "C:\Anaconda3\envs\automl\lib\site-packages\sklearn\utils\validation.py", line 578, in check_array allow_nan=force_all_finite == 'allow-nan')
File "C:\Anaconda3\envs\automl\lib\site-packages\sklearn\utils\validation.py", line 60, in _assert_all_finite msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
Bug Reproduction
Code for reproducing the bug: input_node = ak.ImageInput() output_node = ImageBlock()(input_node) output_node = ak.ClassificationHead() auto_model = ak.AutoModel(tuner='bayesian', inputs=input_node, outputs=output_node, max_trials=50, directory=out_dir, overwrite=True) auto_model.fit(x_train, y_train, epochs=200)
Data used by the code: custom image dataset x_train.shape = (998, 100, 100, 3)
Expected Behavior
Training to continue until max_trials reached.
Setup Details
Include the details about the versions of:
- OS type and version: Win 10 Pro 64-bit
- Python: 3.7.6
- autokeras: 1.0.2
- keras-tuner: 1.0.1
- scikit-learn: 0.22.1
- numpy: 1.18.1
- pandas: 1.0.1
- tensorflow-gpu: 2.1.0
Additional context
None.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
i have same problem while the input is ok (the tuner already make several trials for the same input data)
We have fixed this problem for the new release. Please try autokeras 1.0.3. Thanks.
I still receive that Error with autokeras 1.0.3, with tuner='bayesian'. As with @MagdyIbrahim87 it occurs after few trials
I have the same issue. It happens when the score of the trial is NaN in my case.
We need to repoen the issue.
A workaround would be to remove the trial that has a NaN score.
I also suspect this issue still exist due to some recent user's feedback. Anyone can provide a colab notebook for reproducing the issue?
@haifeng-jin, it is difficult to provide a colab because it depends on the training parameters choosen by keras-tuner and autokeras.
All I know is that the exception "ValueError" is catched in kerastuner/tuners/bayesians.py in _populate_space function.
Below an example of a trial JSON that causes the error (you can see that some values are NaN) :
{"trial_id": "5e9e66472f6f827a8a4b5b8286adb365", "hyperparameters": {"space": [{"class_name": "Boolean", "config": {"name": "structured_data_block_1/normalize", "default": false, "conditions": []}}, {"class_name": "Choice", "config": {"name": "structured_data_block_1/dense_block_1/num_layers", "default": 2, "conditions": [], "values": [1, 2, 3], "ordered": true}}, {"class_name": "Boolean", "config": {"name": "structured_data_block_1/dense_block_1/use_batchnorm", "default": false, "conditions": []}}, {"class_name": "Choice", "config": {"name": "structured_data_block_1/dense_block_1/dropout", "default": 0, "conditions": [], "values": [0.0, 0.25, 0.5], "ordered": true}}, {"class_name": "Choice", "config": {"name": "structured_data_block_1/dense_block_1/units_0", "default": 32, "conditions": [], "values": [16, 32, 64, 128, 256, 512, 1024], "ordered": true}}, {"class_name": "Choice", "config": {"name": "structured_data_block_1/dense_block_1/units_1", "default": 32, "conditions": [], "values": [16, 32, 64, 128, 256, 512, 1024], "ordered": true}}, {"class_name": "Choice", "config": {"name": "regression_head_1/dropout", "default": 0, "conditions": [], "values": [0.0, 0.25, 0.5], "ordered": true}}, {"class_name": "Choice", "config": {"name": "optimizer", "default": "adam", "conditions": [], "values": ["adam", "sgd", "adam_weight_decay"], "ordered": false}}, {"class_name": "Choice", "config": {"name": "learning_rate", "default": 0.001, "conditions": [], "values": [0.1, 0.01, 0.001, 0.0001, 2e-05, 1e-05], "ordered": true}}, {"class_name": "Choice", "config": {"name": "structured_data_block_1/dense_block_1/units_2", "default": 32, "conditions": [], "values": [16, 32, 64, 128, 256, 512, 1024], "ordered": true}}], "values": {"structured_data_block_1/normalize": false, "structured_data_block_1/dense_block_1/num_layers": 3, "structured_data_block_1/dense_block_1/use_batchnorm": false, "structured_data_block_1/dense_block_1/dropout": 0.5, "structured_data_block_1/dense_block_1/units_0": 256, "structured_data_block_1/dense_block_1/units_1": 64, "regression_head_1/dropout": 0.25, "optimizer": "sgd", "learning_rate": 0.1, "structured_data_block_1/dense_block_1/units_2": 64}}, "metrics": {"metrics": {"loss": {"direction": "min", "observations": [{"value": [NaN], "step": 0}]}, "mean_squared_error": {"direction": "min", "observations": [{"value": [NaN], "step": 0}]}, "val_loss": {"direction": "min", "observations": [{"value": [NaN], "step": 0}]}, "val_mean_squared_error": {"direction": "min", "observations": [{"value": [NaN], "step": 0}]}}}, "score": NaN, "best_step": null, "status": "COMPLETED"}
I think I understand the bug. It is caused by the NaN value in validation loss which is the metric for evaluation. We will need to handle it in Keras Tuner.
@haifeng-jin I am experiencing the same issue when the val_loss is NaN. I'm using BayesianOptimization as a tuner.
I have the same issue. It happens when the score of the trial is NaN in my case.
We need to repoen the issue.
A workaround would be to remove the trial that has a NaN score.
@q-55555 Can you let me know how to do that? (removing a trial for which the val_loss is NaN)
I have same issue for latest AutoKeras release using AutoModel for image regresison, with 'bayesian' tuner. Update: I think I may have found the culprit for this error. It's the metrics_tracking.py script in the Keras-Tuner package: kerastuner/engine/metrics_tracking.py.
I have modified the get_best_value(self) function in Line 86:
def get_best_value(self):
values = list(
obs.mean() for obs in self._observations.values())
if not values:
return None
if self.direction == 'min':
if np.isnan(np.nanmin(values)):
return 999
else:
return np.nanmin(values)
return np.nanmax(values)
I think when the metric scores (loss or custom metric) are all NAN, this causes np.nanmin(values) to return NAN, which then leads to the ValueError. Instead, we want np.nanmin(values) to return a number, in this case 999. Doing this avoids the dreaded ValueError exception and AutoKeras happily continues:
15/15 - 24s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 2/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 3/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 4/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 5/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 6/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 7/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 8/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 9/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan Epoch 10/200 15/15 - 1s - loss: nan - mae: nan - mape: nan - val_loss: nan - val_mae: nan - val_mape: nan /home/joshua/PycharmProjects/pythonProject2/venv/lib/python3.8/site-packages/kerastuner/engine/metrics_tracking.py:92: RuntimeWarning: All-NaN axis encountered if np.isnan(np.nanmin(values)): Trial 8 Complete [00h 00m 35s] val_loss: 999 Best val_loss So Far: 305.5285949707031 Total elapsed time: 00h 19m 06s
This fix prevents AutoKeras from crashing due to ValueError. I am still testing it further, hopefully this helps!
https://github.com/keras-team/keras-tuner/issues/477#issuecomment-786920437 The TerminateonNan callback may be a temporary soluiton.