keras
keras copied to clipboard
BaseLogger callback seems broke in v2 - KeyError and wrong seen sample compute
System information.
- Have I written custom code (as opposed to using a stock example script provided in Keras): TF 2.4.1
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below):
- Python version: 3.8.3
- Bazel version (if compiling from source):
- GPU model and memory:
- Exact command to reproduce:
Describe the problem.
Describe the problem clearly here. Be sure to convey here why it's a bug in Keras or why the requested feature is needed.
It looks like BaseLogger in keras has broken somewhere in-migration from v1 -> v2?
import tensorflow as tf
import numpy as np
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(8, input_shape=(5,)))
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer="Adam", loss="binary_crossentropy")
x = np.random.rand(4,5)
y = np.random.randint(0, 2, (4,))
model.fit(x, y, epochs=10, callbacks=[tf.keras.callbacks.BaseLogger()])
I end up getting KeyError: 'metrics'.
I have been hit by this and trying to understand what the Tensorflow/Keras team's vision is with BaseLogger.
Looks like BaseLogger was applied by default on all keras model but this is only true for TensorFlow V1 models as indicated here https://github.com/tensorflow/tensorflow/blob/e104b6a9e87ea5956451ab56c1f5ca486c511bc4/tensorflow/python/keras/callbacks.py#L73-L112. I have been debugging into this code flow and it appears that BaseLogger is no longer applied by default to V2 models.
Also, it appears that BaseLogger is severely broken for use in v2 graphs as no such metrics exists in param https://github.com/tensorflow/tensorflow/blob/e104b6a9e87ea5956451ab56c1f5ca486c511bc4/tensorflow/python/keras/callbacks.py#L944. I think the right fix here is to change self.params['metrics'] to logs.keys(). However, looking at how averaging is done, that part also seems broken as self.seem is always zero for generator, dataset-based training as batch value is always 0 for those cases. Also, then here https://github.com/tensorflow/tensorflow/blob/e104b6a9e87ea5956451ab56c1f5ca486c511bc4/tensorflow/python/keras/callbacks.py#L927-L930
no such thing as size or num_steps exists on log anymore on v2 graphs, so the default of 0/1 is picked that makes seen ==0 leading to ZeroDivisionError: float division by zero.
Is there another alternative approach recommended for BaseLogger for v2 then? Documentation seems to suggest https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/BaseLogger?hl=en BaseLogger continues to be applied on by default to all keras model v2 or v1 but I disagree with this, this is not true anymore with v2, I never break into this flow unless I have custom callback-based of BaseLogger.
This was raised earlier in TF codebase here https://github.com/tensorflow/tensorflow/issues/46344 but I am confused looking at inactivity on this issue, surely this should be a very critical bug that should be fixed ASAP if BaseLogger is the the way to add to customize log stream, if not what is the alternate method.
If I was to ignore BaseLogger completely, what is the recommended approach to write custom Callback so I can compute seen sample counts if I use generator/dataset based train where model.fit is not called with batch size?
Describe the current behavior. BaseLogger to work as documented..
Describe the expected behavior.
KeyError and also wrong logic for seen samples or averaging I think.
- Do you want to contribute a PR? (yes/no): yes if I get better understanding of vision with BaseLogger
Standalone code to reproduce the issue. Please see above
Source code / logs.
Please see above
I am able to reproduce the error. Please check the gist here.
The following is the error trace
Epoch 1/10
1/1 [==============================] - 0s 377ms/step - loss: 7.7125
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-4-a9e266b0dcc1> in <module>()
8 x = np.random.rand(4,5)
9 y = np.random.randint(0, 2, (4,))
---> 10 model.fit(x, y, epochs=10, callbacks=[tf.keras.callbacks.BaseLogger()])
1 frames
/usr/local/lib/python3.7/dist-packages/keras/callbacks.py in on_epoch_end(self, epoch, logs)
920 def on_epoch_end(self, epoch, logs=None):
921 if logs is not None:
--> 922 for k in self.params['metrics']:
923 if k in self.totals:
924 # Make value available to next callbacks.
KeyError: 'metrics')
Hello, Thank you for reporting an issue.
We're currently in the process of migrating the new Keras 3 code base from keras-team/keras-core to keras-team/keras. Consequently, This issue may not be relevant to the Keras 3 code base . After the migration is successfully completed, feel free to reopen this Issue at keras-team/keras if you believe it remains relevant to the Keras 3 code base. If instead this Issue is a bug or security issue in legacy tf.keras, you can instead report a new issue at keras-team/tf-keras, which hosts the TensorFlow-only, legacy version of Keras.
To know more about Keras 3, please read https://keras.io/keras_core/announcement/