keras icon indicating copy to clipboard operation
keras copied to clipboard

TimeDistributed model compatibility

Open andreped opened this issue 2 years ago • 6 comments

System information.

  • Have I written custom code (as opposed to using a stock example script provided in Keras): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 and Ubuntu 18.04
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.10.0a20220604 (nightly)
  • Python version: 3.8.10

Describe the problem.

Just came across something rather strange. Seems like some architectures in keras.applications does not work directly with TimeDistributed.

For an example, you have the three architectures MobileNetV2, *V3, and a ConvNeXt-S architecture. If you use MobileNetV2 it works, but for the two others it does not.

AFAIK, eager mode is enabled by default, so I don't see why this fails here? I also have not compiled the model yet or run it, where I would then do run_eagerly=True, if necessary.

Describe the current behavior.

Fails for some models. MobileNetV3 also fails in TF 2.8.0, so it has apparently been an issue a while.

Describe the expected behavior.

Should expect the same behaviour for all model applications in this scenario.

Potential solution.

I think this has to do with how the output_shape is dynamically fetched, which may not work as intended for some layers? Perhaps layers like TFOpLambda and LayerScale require that the compute_output_shape methods are implemented to work as expected?

Standalone code to reproduce the issue.

Works for MobileNetV2 and not for the two other.

import keras
from keras.applications import MobileNetV2, MobileNetV3Small, ConvNeXtSmall

input_ = keras.layers.Input(shape=(8, 224, 224, 3))

# base_model  = MobileNetV2(include_top=True, input_shape=(224, 224, 3))
base_model = MobileNetV3Small(include_top=True, input_shape=(224, 224, 3))
# base_model = ConvNeXtSmall(include_top=True, input_shape=(224, 224, 3))

output = keras.layers.TimeDistributed(base_model)(input_)
model = keras.Model(inputs=input_, outputs=output)

Source code / logs.

Error prompt for MobileNetV3:

Traceback (most recent call last):
File “.\test_timedistributed.py”, line 12, in
output = keras.layers.TimeDistributed(base_model)(input_)
File “C:\Users\47955\workspace\sandbox\venv\lib\site-packages\keras\utils\traceback_utils.py”, line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File “C:\Users\47955\workspace\sandbox\venv\lib\site-packages\keras\engine\base_layer.py”, line 879, in compute_output_shape
raise NotImplementedError(
NotImplementedError: Exception encountered when calling layer “time_distributed” (type TimeDistributed).

Please run in eager mode or implement the compute_output_shape method on your layer (TFOpLambda).

Call arguments received by layer “time_distributed” (type TimeDistributed):
• inputs=tf.Tensor(shape=(None, 8, 224, 224, 3), dtype=float32)
• training=False
• mask=None

and for ConvNeXtSmall:

Traceback (most recent call last):
File “.\test_timedistributed.py”, line 10, in
output = keras.layers.TimeDistributed(base_model)(input_)
File “C:\Users\47955\workspace\sandbox\venv\lib\site-packages\keras\utils\traceback_utils.py”, line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File “C:\Users\47955\workspace\sandbox\venv\lib\site-packages\keras\engine\base_layer.py”, line 879, in compute_output_shape
raise NotImplementedError(
NotImplementedError: Exception encountered when calling layer “time_distributed” (type TimeDistributed).

Please run in eager mode or implement the compute_output_shape method on your layer (LayerScale).

Call arguments received by layer “time_distributed” (type TimeDistributed):
• inputs=tf.Tensor(shape=(None, 8, 224, 224, 3), dtype=float32)
• training=None
• mask=None`

andreped avatar Jun 05 '22 12:06 andreped

@gowthamkpr, I was able to reproduce the issue in tensorflow v2.9 and nightly. Kindly find the gist of it here.

tilakrayal avatar Jun 06 '22 10:06 tilakrayal

Triage notes: Probably related to that the keras.applications models do have compute_output_shape implemented on the model.

qlzh727 avatar Jun 08 '22 19:06 qlzh727

Triage notes: Probably related to that the keras.applications models do have compute_output_shape implemented on the model.

I'm quite sure that is the case. I am just wondering why this is a problem at the first place. Might be that Keras introduced dynamic output_shape fetching too early, and now there is a lot of stuff being introduced that just breaks in scenarios where they should've worked if the compute_output_shape were implemented as they used to be.

I have only seen this with the TimeDistributed layer, but I am certain that there are lots of other layers and applications where this will break.

But I believe this can be solved by updating the TimeDistributed layer to not depend on using the compute_output_shape of inserted layers, but handle this dynamically, as Keras somehow is doing for other stuff. Would be great if this was fixed soon, as I need this to work for my research.

andreped avatar Jun 09 '22 15:06 andreped

Triage notes: Roundrobin to chen.

qlzh727 avatar Jun 09 '22 17:06 qlzh727

Any way to use tf.map_fn instead? Same bug

douglas125 avatar Aug 07 '22 16:08 douglas125

Sorry I don't have enough insights on this one. Will triage in the meeting.

chenmoneygithub avatar Aug 09 '22 03:08 chenmoneygithub

@chenmoneygithub Any update on this?

I was planning on using the new architectures with TimeDistributed in a study, but I am unable to until this is solved.

andreped avatar Aug 15 '22 09:08 andreped

@andreped a fix for this was pushed yesterday. If you pip install tf-nightly you can get a version of Tensorflow and Keras with the fix and the code above should work.

hertschuh avatar Aug 16 '22 16:08 hertschuh

@andreped a fix for this was pushed yesterday.

@hertschuh: Just tried running the test script above using the nightly build. No errors prompted for any of the architectures. Brilliant! :]

Will test this on some downstream training and inference pipelines tomorrow, but for now I believe this issue has been solved. @douglas125: Would be great if you could test if the nightly build works for you as well.

Probably a good idea to keep this issue open until TF==2.11 has been released.

andreped avatar Aug 16 '22 17:08 andreped

Yes, problem solved, well done. I didn't test it using my current prod pipeline (tf==2.8.2 there iirc) but the model now builds correctly.

For the record (using Colab):

Current version:

!pip install tensorflow --upgrade -q

import tensorflow as tf
print(tf.__version__)

backbone = tf.keras.applications.EfficientNetB0(
    include_top=False,
    weights="imagenet",
)

from tensorflow.keras import layers as L
from tensorflow.keras import Model
def get_model():
    # [batch], frames, width, height, 3
    inp = L.Input((None, None, None, 3))
    feats = L.TimeDistributed(backbone)(inp)
    return Model(inputs=inp, outputs=feats)
m = get_model()

output:

2.9.1
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
[<ipython-input-1-0d6a848b8de9>](https://localhost:8080/#) in <module>
     16     feats = L.TimeDistributed(backbone)(inp)
     17     return Model(inputs=inp, outputs=feats)
---> 18 m = get_model()

2 frames
[/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py](https://localhost:8080/#) in compute_output_shape(self, input_shape)
    827     raise NotImplementedError(
    828         'Please run in eager mode or implement the `compute_output_shape` '
--> 829         'method on your layer (%s).' % self.__class__.__name__)
    830 
    831   @doc_controls.for_subclass_implementers

NotImplementedError: Exception encountered when calling layer "time_distributed" (type TimeDistributed).

Please run in eager mode or implement the `compute_output_shape` method on your layer (TFOpLambda).

Call arguments received by layer "time_distributed" (type TimeDistributed):
  • inputs=tf.Tensor(shape=(None, None, None, None, 3), dtype=float32)
  • training=False
  • mask=None

Nightly:

!pip install tf-nightly --upgrade -q

import tensorflow as tf
print(tf.__version__)

backbone = tf.keras.applications.EfficientNetB0(
    include_top=False,
    weights="imagenet",
)

from tensorflow.keras import layers as L
from tensorflow.keras import Model
def get_model():
    # [batch], frames, width, height, 3
    inp = L.Input((None, None, None, 3))
    feats = L.TimeDistributed(backbone)(inp)
    return Model(inputs=inp, outputs=feats)
m = get_model()

x = tf.ones((2, 5, 224, 224, 3))

backbone.inputs, backbone.outputs, m(x).shape

Nightly output:

2.11.0-dev20220816
([<KerasTensor: shape=(None, None, None, 3) dtype=float32 (created by layer 'input_3')>],
 [<KerasTensor: shape=(None, None, None, 1280) dtype=float32 (created by layer 'top_activation')>],
 TensorShape([2, 5, 7, 7, 1280]))

douglas125 avatar Aug 16 '22 19:08 douglas125

Might be a silly question. I am not that familiar with tf-nightly vs stable version of TF/Keras. I have often have challenges with imports when using nightly. I experienced the same now.

Instead of importing stuff from tf.keras, I had to throw in python, such that tf.python.keras, which is what I used to do back in the TF==1.13.1 days, but I have not had the same issue in TF==2.x, until testing this nightly. Why is that necessary?

Adam optimizer imports also have very strange behaviour, but importing throug the tf.python.keras.optimizers.adam_v2.Adam seemed to work, using this approach.

Also, I am unable to access the methods within tf.python.keras.mixed_precision. Both mixed_precision.LossScaleOptimizer and mixed_precision.set_global_policy fails.

Am I doing something silly? This seemed to work fine in TF==2.9, but then again, I was testing this on a rather complex environment. What is the correct way of importing when using the nightly? I am importing through tf not keras, btw.


EDIT: I observed this using Python 3.8.10 on an Ubuntu 20.x desktop computer.

andreped avatar Aug 17 '22 09:08 andreped

I managed to get imports somewhat working by throwing lots of python boys in the imports, such that from tensorflow.python.keras import stuff.

I was able to use the methods within mixed_precision by using tf.keras.mixed_precision.LossScaleOptimizer directly, instead of importing like so from tensorflow.keras import mixed_precision. Rather strange that this worked in TF==2.9.x and not nightly. Any ideas why?


EDIT: The Adam import through adam_v2 was not compatible with the mixed precision stuff, but if use Adam directly by tf.keras.optimizers.Adam, without the prior import step, then it works. Did not need the python also. Rather strange...

andreped avatar Aug 17 '22 09:08 andreped

I just got an issue with TimeDistributed on one of my trainings. Everything starts and runs, but after a while it failed. This was observed using the MobileNetV3Small architecture in a multiple instance learning setting, where TimeDistributed is commonly used.

Epoch 1/1000
2022-08-17 13:11:46.912208: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:426] Loaded cuDNN version 8301
2022-08-17 13:11:47.505551: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
  28/1445 [..............................] - ETA: 57:29 - loss: 1.9420 - bag_pred_loss: 0.4043 - bag_pred_f1_score: 0.4432 - bag_pred_focal_loss: 0.1727 - bag_pred_acc: 0.6250 - bag_pred_accuracy: 0.4732
  63/1445 [>.............................] - ETA: 50:57 - loss: 1.7378 - bag_pred_loss: 0.4082 - bag_pred_f1_score: 0.4282 - bag_pred_focal_loss: 0.1682 - bag_pred_acc: 0.6587 - bag_pred_accuracy: 0.4749Traceback (most recent call last):
  File "source/main.py", line 454, in <module>
    main()
  File "source/main.py", line 279, in main
    model.fit(
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/keras/backend.py", line 5115, in <genexpr>
    current_input = tuple(ta.read(time) for ta in input_ta)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer "time_distributed_1" "                 f"(type TimeDistributed).

Could not read index 9 twice because it was cleared after a previous read (perhaps try setting clear_after_read = false?)

Call arguments received by layer "time_distributed_1" "                 f"(type TimeDistributed):
  • inputs=tf.Tensor(shape=(4, 20, 64), dtype=float16)
  • training=True
  • mask=None
terminate called without an active exception
Aborted (core dumped)

andreped avatar Aug 17 '22 11:08 andreped

@andreped

  • about the imports while using nightly, everything should work the same as normal, you shouldn't need tf.python.keras for instance. Maybe it's because it didn't pick up keras nightly at the same time. I you do pip install --upgrade tf-nightly it should work. You can also try to install keras-nightly manually. Lastly, you can try to first uninstall tensorflow and keras before you install nightly.
  • about the InvalidArgumentError with MobileNetV3Small, can you create a separate issue? This is unrelated to the issue described at the top and I don't want to create confusion. Thanks!
  • I will close this issue now, we don't keep them around until Tensorflow is released.

hertschuh avatar Aug 18 '22 18:08 hertschuh

  • I will close this issue now, we don't keep them around until Tensorflow is released.

No worries! Just close the issue.

I will test the reinstall suggestion and make a new issue for the other stuff I observed.

Thanks for the rapid reply :)

andreped avatar Aug 18 '22 19:08 andreped

Closing now. This fix will be released with Tensorflow 2.11.

hertschuh avatar Aug 18 '22 20:08 hertschuh

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Aug 18 '22 20:08 google-ml-butler[bot]