keras icon indicating copy to clipboard operation
keras copied to clipboard

BaseLogger callback seems broke in v2 - KeyError and wrong seen sample compute

Open suneeta-mall opened this issue 4 years ago • 1 comments

System information.

  • Have I written custom code (as opposed to using a stock example script provided in Keras): TF 2.4.1
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below):
  • Python version: 3.8.3
  • Bazel version (if compiling from source):
  • GPU model and memory:
  • Exact command to reproduce:

Describe the problem.

Describe the problem clearly here. Be sure to convey here why it's a bug in Keras or why the requested feature is needed.

It looks like BaseLogger in keras has broken somewhere in-migration from v1 -> v2?

import tensorflow as tf
import numpy as np

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(8, input_shape=(5,)))
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer="Adam", loss="binary_crossentropy")
x = np.random.rand(4,5)
y = np.random.randint(0, 2, (4,))
model.fit(x, y, epochs=10, callbacks=[tf.keras.callbacks.BaseLogger()])

I end up getting KeyError: 'metrics'.

I have been hit by this and trying to understand what the Tensorflow/Keras team's vision is with BaseLogger.

Looks like BaseLogger was applied by default on all keras model but this is only true for TensorFlow V1 models as indicated here https://github.com/tensorflow/tensorflow/blob/e104b6a9e87ea5956451ab56c1f5ca486c511bc4/tensorflow/python/keras/callbacks.py#L73-L112. I have been debugging into this code flow and it appears that BaseLogger is no longer applied by default to V2 models.

Also, it appears that BaseLogger is severely broken for use in v2 graphs as no such metrics exists in param https://github.com/tensorflow/tensorflow/blob/e104b6a9e87ea5956451ab56c1f5ca486c511bc4/tensorflow/python/keras/callbacks.py#L944. I think the right fix here is to change self.params['metrics'] to logs.keys(). However, looking at how averaging is done, that part also seems broken as self.seem is always zero for generator, dataset-based training as batch value is always 0 for those cases. Also, then here https://github.com/tensorflow/tensorflow/blob/e104b6a9e87ea5956451ab56c1f5ca486c511bc4/tensorflow/python/keras/callbacks.py#L927-L930 no such thing as size or num_steps exists on log anymore on v2 graphs, so the default of 0/1 is picked that makes seen ==0 leading to ZeroDivisionError: float division by zero.

Is there another alternative approach recommended for BaseLogger for v2 then? Documentation seems to suggest https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/BaseLogger?hl=en BaseLogger continues to be applied on by default to all keras model v2 or v1 but I disagree with this, this is not true anymore with v2, I never break into this flow unless I have custom callback-based of BaseLogger.

This was raised earlier in TF codebase here https://github.com/tensorflow/tensorflow/issues/46344 but I am confused looking at inactivity on this issue, surely this should be a very critical bug that should be fixed ASAP if BaseLogger is the the way to add to customize log stream, if not what is the alternate method.

If I was to ignore BaseLogger completely, what is the recommended approach to write custom Callback so I can compute seen sample counts if I use generator/dataset based train where model.fit is not called with batch size?

Describe the current behavior. BaseLogger to work as documented..

Describe the expected behavior. KeyError and also wrong logic for seen samples or averaging I think.

Contributing.

  • Do you want to contribute a PR? (yes/no): yes if I get better understanding of vision with BaseLogger

Standalone code to reproduce the issue. Please see above

Source code / logs.

Please see above

suneeta-mall avatar Aug 25 '21 10:08 suneeta-mall

I am able to reproduce the error. Please check the gist here.

The following is the error trace

Epoch 1/10
1/1 [==============================] - 0s 377ms/step - loss: 7.7125
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-a9e266b0dcc1> in <module>()
      8 x = np.random.rand(4,5)
      9 y = np.random.randint(0, 2, (4,))
---> 10 model.fit(x, y, epochs=10, callbacks=[tf.keras.callbacks.BaseLogger()])

1 frames
/usr/local/lib/python3.7/dist-packages/keras/callbacks.py in on_epoch_end(self, epoch, logs)
    920   def on_epoch_end(self, epoch, logs=None):
    921     if logs is not None:
--> 922       for k in self.params['metrics']:
    923         if k in self.totals:
    924           # Make value available to next callbacks.

KeyError: 'metrics') 

jvishnuvardhan avatar Aug 25 '21 18:08 jvishnuvardhan

Hello, Thank you for reporting an issue.

We're currently in the process of migrating the new Keras 3 code base from keras-team/keras-core to keras-team/keras. Consequently, This issue may not be relevant to the Keras 3 code base . After the migration is successfully completed, feel free to reopen this Issue at keras-team/keras if you believe it remains relevant to the Keras 3 code base. If instead this Issue is a bug or security issue in legacy tf.keras, you can instead report a new issue at keras-team/tf-keras, which hosts the TensorFlow-only, legacy version of Keras.

To know more about Keras 3, please read https://keras.io/keras_core/announcement/

SuryanarayanaY avatar Sep 22 '23 05:09 SuryanarayanaY

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Sep 22 '23 05:09 google-ml-butler[bot]