keras
keras copied to clipboard
TimeDistributed model compatibility
System information.
- Have I written custom code (as opposed to using a stock example script provided in Keras): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 and Ubuntu 18.04
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): 2.10.0a20220604 (nightly)
- Python version: 3.8.10
Describe the problem.
Just came across something rather strange. Seems like some architectures in keras.applications does not work directly with TimeDistributed.
For an example, you have the three architectures MobileNetV2, *V3, and a ConvNeXt-S architecture. If you use MobileNetV2 it works, but for the two others it does not.
AFAIK, eager mode is enabled by default, so I don't see why this fails here? I also have not compiled the model yet or run it, where I would then do run_eagerly=True
, if necessary.
Describe the current behavior.
Fails for some models. MobileNetV3 also fails in TF 2.8.0, so it has apparently been an issue a while.
Describe the expected behavior.
Should expect the same behaviour for all model applications in this scenario.
Potential solution.
I think this has to do with how the output_shape
is dynamically fetched, which may not work as intended for some layers? Perhaps layers like TFOpLambda and LayerScale require that the compute_output_shape
methods are implemented to work as expected?
Standalone code to reproduce the issue.
Works for MobileNetV2 and not for the two other.
import keras
from keras.applications import MobileNetV2, MobileNetV3Small, ConvNeXtSmall
input_ = keras.layers.Input(shape=(8, 224, 224, 3))
# base_model = MobileNetV2(include_top=True, input_shape=(224, 224, 3))
base_model = MobileNetV3Small(include_top=True, input_shape=(224, 224, 3))
# base_model = ConvNeXtSmall(include_top=True, input_shape=(224, 224, 3))
output = keras.layers.TimeDistributed(base_model)(input_)
model = keras.Model(inputs=input_, outputs=output)
Source code / logs.
Error prompt for MobileNetV3:
Traceback (most recent call last):
File “.\test_timedistributed.py”, line 12, in
output = keras.layers.TimeDistributed(base_model)(input_)
File “C:\Users\47955\workspace\sandbox\venv\lib\site-packages\keras\utils\traceback_utils.py”, line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File “C:\Users\47955\workspace\sandbox\venv\lib\site-packages\keras\engine\base_layer.py”, line 879, in compute_output_shape
raise NotImplementedError(
NotImplementedError: Exception encountered when calling layer “time_distributed” (type TimeDistributed).
Please run in eager mode or implement the compute_output_shape method on your layer (TFOpLambda).
Call arguments received by layer “time_distributed” (type TimeDistributed):
• inputs=tf.Tensor(shape=(None, 8, 224, 224, 3), dtype=float32)
• training=False
• mask=None
and for ConvNeXtSmall:
Traceback (most recent call last):
File “.\test_timedistributed.py”, line 10, in
output = keras.layers.TimeDistributed(base_model)(input_)
File “C:\Users\47955\workspace\sandbox\venv\lib\site-packages\keras\utils\traceback_utils.py”, line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File “C:\Users\47955\workspace\sandbox\venv\lib\site-packages\keras\engine\base_layer.py”, line 879, in compute_output_shape
raise NotImplementedError(
NotImplementedError: Exception encountered when calling layer “time_distributed” (type TimeDistributed).
Please run in eager mode or implement the compute_output_shape method on your layer (LayerScale).
Call arguments received by layer “time_distributed” (type TimeDistributed):
• inputs=tf.Tensor(shape=(None, 8, 224, 224, 3), dtype=float32)
• training=None
• mask=None`
@gowthamkpr, I was able to reproduce the issue in tensorflow v2.9 and nightly. Kindly find the gist of it here.
Triage notes: Probably related to that the keras.applications models do have compute_output_shape
implemented on the model.
Triage notes: Probably related to that the keras.applications models do have
compute_output_shape
implemented on the model.
I'm quite sure that is the case. I am just wondering why this is a problem at the first place. Might be that Keras introduced dynamic output_shape
fetching too early, and now there is a lot of stuff being introduced that just breaks in scenarios where they should've worked if the compute_output_shape
were implemented as they used to be.
I have only seen this with the TimeDistributed
layer, but I am certain that there are lots of other layers and applications where this will break.
But I believe this can be solved by updating the TimeDistributed
layer to not depend on using the compute_output_shape
of inserted layers, but handle this dynamically, as Keras somehow is doing for other stuff. Would be great if this was fixed soon, as I need this to work for my research.
Triage notes: Roundrobin to chen.
Any way to use tf.map_fn instead? Same bug
Sorry I don't have enough insights on this one. Will triage in the meeting.
@chenmoneygithub Any update on this?
I was planning on using the new architectures with TimeDistributed in a study, but I am unable to until this is solved.
@andreped a fix for this was pushed yesterday.
If you pip install tf-nightly
you can get a version of Tensorflow and Keras with the fix and the code above should work.
@andreped a fix for this was pushed yesterday.
@hertschuh: Just tried running the test script above using the nightly build. No errors prompted for any of the architectures. Brilliant! :]
Will test this on some downstream training and inference pipelines tomorrow, but for now I believe this issue has been solved. @douglas125: Would be great if you could test if the nightly build works for you as well.
Probably a good idea to keep this issue open until TF==2.11 has been released.
Yes, problem solved, well done. I didn't test it using my current prod pipeline (tf==2.8.2 there iirc) but the model now builds correctly.
For the record (using Colab):
Current version:
!pip install tensorflow --upgrade -q
import tensorflow as tf
print(tf.__version__)
backbone = tf.keras.applications.EfficientNetB0(
include_top=False,
weights="imagenet",
)
from tensorflow.keras import layers as L
from tensorflow.keras import Model
def get_model():
# [batch], frames, width, height, 3
inp = L.Input((None, None, None, 3))
feats = L.TimeDistributed(backbone)(inp)
return Model(inputs=inp, outputs=feats)
m = get_model()
output:
2.9.1
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
[<ipython-input-1-0d6a848b8de9>](https://localhost:8080/#) in <module>
16 feats = L.TimeDistributed(backbone)(inp)
17 return Model(inputs=inp, outputs=feats)
---> 18 m = get_model()
2 frames
[/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py](https://localhost:8080/#) in compute_output_shape(self, input_shape)
827 raise NotImplementedError(
828 'Please run in eager mode or implement the `compute_output_shape` '
--> 829 'method on your layer (%s).' % self.__class__.__name__)
830
831 @doc_controls.for_subclass_implementers
NotImplementedError: Exception encountered when calling layer "time_distributed" (type TimeDistributed).
Please run in eager mode or implement the `compute_output_shape` method on your layer (TFOpLambda).
Call arguments received by layer "time_distributed" (type TimeDistributed):
• inputs=tf.Tensor(shape=(None, None, None, None, 3), dtype=float32)
• training=False
• mask=None
Nightly:
!pip install tf-nightly --upgrade -q
import tensorflow as tf
print(tf.__version__)
backbone = tf.keras.applications.EfficientNetB0(
include_top=False,
weights="imagenet",
)
from tensorflow.keras import layers as L
from tensorflow.keras import Model
def get_model():
# [batch], frames, width, height, 3
inp = L.Input((None, None, None, 3))
feats = L.TimeDistributed(backbone)(inp)
return Model(inputs=inp, outputs=feats)
m = get_model()
x = tf.ones((2, 5, 224, 224, 3))
backbone.inputs, backbone.outputs, m(x).shape
Nightly output:
2.11.0-dev20220816
([<KerasTensor: shape=(None, None, None, 3) dtype=float32 (created by layer 'input_3')>],
[<KerasTensor: shape=(None, None, None, 1280) dtype=float32 (created by layer 'top_activation')>],
TensorShape([2, 5, 7, 7, 1280]))
Might be a silly question. I am not that familiar with tf-nightly vs stable version of TF/Keras. I have often have challenges with imports when using nightly. I experienced the same now.
Instead of importing stuff from tf.keras
, I had to throw in python, such that tf.python.keras
, which is what I used to do back in the TF==1.13.1 days, but I have not had the same issue in TF==2.x, until testing this nightly. Why is that necessary?
Adam optimizer imports also have very strange behaviour, but importing throug the tf.python.keras.optimizers.adam_v2.Adam
seemed to work, using this approach.
Also, I am unable to access the methods within tf.python.keras.mixed_precision. Both mixed_precision.LossScaleOptimizer
and mixed_precision.set_global_policy
fails.
Am I doing something silly? This seemed to work fine in TF==2.9, but then again, I was testing this on a rather complex environment. What is the correct way of importing when using the nightly? I am importing through tf not keras, btw.
EDIT: I observed this using Python 3.8.10 on an Ubuntu 20.x desktop computer.
I managed to get imports somewhat working by throwing lots of python
boys in the imports, such that from tensorflow.python.keras import stuff
.
I was able to use the methods within mixed_precision by using tf.keras.mixed_precision.LossScaleOptimizer
directly, instead of importing like so from tensorflow.keras import mixed_precision
. Rather strange that this worked in TF==2.9.x and not nightly. Any ideas why?
EDIT: The Adam import through adam_v2
was not compatible with the mixed precision stuff, but if use Adam directly by tf.keras.optimizers.Adam
, without the prior import step, then it works. Did not need the python
also. Rather strange...
I just got an issue with TimeDistributed on one of my trainings. Everything starts and runs, but after a while it failed.
This was observed using the MobileNetV3Small
architecture in a multiple instance learning setting, where TimeDistributed is commonly used.
Epoch 1/1000
2022-08-17 13:11:46.912208: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:426] Loaded cuDNN version 8301
2022-08-17 13:11:47.505551: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
28/1445 [..............................] - ETA: 57:29 - loss: 1.9420 - bag_pred_loss: 0.4043 - bag_pred_f1_score: 0.4432 - bag_pred_focal_loss: 0.1727 - bag_pred_acc: 0.6250 - bag_pred_accuracy: 0.4732
63/1445 [>.............................] - ETA: 50:57 - loss: 1.7378 - bag_pred_loss: 0.4082 - bag_pred_f1_score: 0.4282 - bag_pred_focal_loss: 0.1682 - bag_pred_acc: 0.6587 - bag_pred_accuracy: 0.4749Traceback (most recent call last):
File "source/main.py", line 454, in <module>
main()
File "source/main.py", line 279, in main
model.fit(
File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/keras/backend.py", line 5115, in <genexpr>
current_input = tuple(ta.read(time) for ta in input_ta)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer "time_distributed_1" " f"(type TimeDistributed).
Could not read index 9 twice because it was cleared after a previous read (perhaps try setting clear_after_read = false?)
Call arguments received by layer "time_distributed_1" " f"(type TimeDistributed):
• inputs=tf.Tensor(shape=(4, 20, 64), dtype=float16)
• training=True
• mask=None
terminate called without an active exception
Aborted (core dumped)
@andreped
- about the imports while using nightly, everything should work the same as normal, you shouldn't need
tf.python.keras
for instance. Maybe it's because it didn't pick up keras nightly at the same time. I you dopip install --upgrade tf-nightly
it should work. You can also try to installkeras-nightly
manually. Lastly, you can try to first uninstall tensorflow and keras before you install nightly. - about the
InvalidArgumentError
withMobileNetV3Small
, can you create a separate issue? This is unrelated to the issue described at the top and I don't want to create confusion. Thanks! - I will close this issue now, we don't keep them around until Tensorflow is released.
- I will close this issue now, we don't keep them around until Tensorflow is released.
No worries! Just close the issue.
I will test the reinstall suggestion and make a new issue for the other stuff I observed.
Thanks for the rapid reply :)
Closing now. This fix will be released with Tensorflow 2.11.