Mask_RCNN icon indicating copy to clipboard operation
Mask_RCNN copied to clipboard

TypeError: unhashable type: 'ListWrapper' TensorFlow 2.1.0rc0 during training

Open kiflowb777 opened this issue 5 years ago • 24 comments

Python 3.6 TensorFlow: 2.1.0rc0 Keras: 2.2.4-tf

After start training:

 File "C:\project\maskRCNN\model.py", line 349, in compile
    self.keras_model.add_loss(loss)
  File "C:\python36\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 1081, in add_loss
    self._graph_network_add_loss(symbolic_loss)
  File "C:\python36\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 1484, in _graph_network_add_loss
    self._insert_layers(new_layers, new_nodes)
  File "C:\python36\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 1439, in _insert_layers
    layer_set = set(self._layers)
  File "C:\python36\lib\site-packages\tensorflow_core\python\training\tracking\data_structures.py", line 598, in __hash__
    raise TypeError("unhashable type: 'ListWrapper'")
TypeError: unhashable type: 'ListWrapper'

kiflowb777 avatar Dec 02 '19 11:12 kiflowb777

Related topics: https://github.com/tensorflow/tensorflow/issues/34962 https://github.com/tensorflow/tensorflow/issues/33471 https://github.com/tensorflow/tensorflow/issues/32127

kiflowb777 avatar Dec 09 '19 14:12 kiflowb777

Any estimates on this issue?

dankor avatar Dec 11 '19 09:12 dankor

How are you running this with TF 2.0? Are there updates or documentation on conversion? Am I missing something??

Sorry for such an open question...

taylormcclenny avatar Dec 11 '19 20:12 taylormcclenny

@taylormcclenny Yes I try run my maskRCNN code with tf.keras on TF 1.14, 1,15, 2.0 and 2.1rc0 Here more info about this issue: https://github.com/tensorflow/tensorflow/issues/34962

The "ListWrapper" bug appear after fixing output layer shape: https://github.com/tensorflow/tensorflow/issues/33785

kiflowb777 avatar Dec 12 '19 10:12 kiflowb777

@kiflowb777 & @dankor - My understanding is that Mask-RCNN won't run on TF 2.0. See the comments on this article, since TF 2.0's release.

I've been attempting to convert this model to run on TF 2.0 but I just get endless errors. Again, I apologize for a question that is so much more broad than your original post, but I can't find the info elsewhere - Is there somewhere else I can look for finding an updated Mask-RCNN that works (kind of) on TF 2.0?

taylormcclenny avatar Dec 12 '19 14:12 taylormcclenny

It seems to require also heavy-lifting rework rather than one-convert-script-run renaming methods. Currently, as I see, @tomgross is working on the migration since he has marked this bug here.

dankor avatar Dec 13 '19 07:12 dankor

I found the cause and the solution. This is the responsible tensorflow / keras commit: https://github.com/tensorflow/tensorflow/commit/45df90d5c2d6b125a10cb0809899c254d49412e6#diff-8eb7e20502209f082d0cb15119a50413R781

As documented you need to wrap the loss function with an empty lamda, when adding to the model. I've added the fix to my tensorflow 2.0 compatibility PR here: https://github.com/matterport/Mask_RCNN/pull/1896/files#diff-312c7e001d14bbb7ce5f8978f7b04cc3R2171

tomgross avatar Dec 16 '19 19:12 tomgross

I think the offending lines might be where these protected variables of keras_model are accessed directly:

self.keras_model._losses = []
self.keras_model._per_input_losses = {}

Removing those allowed me to proceed with training without setting those empty lambdas.

mmalahe avatar Dec 28 '19 11:12 mmalahe

Removing the brackets works well to me,

modify from loss = (tf.reduce_mean(input_tensor=layer.output, keepdims=True)) to loss = tf.reduce_mean(input_tensor=layer.output, keepdims=True)

travishsu avatar Jan 16 '20 14:01 travishsu

I think the offending lines might be where these protected variables of keras_model are accessed directly:

self.keras_model._losses = []
self.keras_model._per_input_losses = {}

Removing those allowed me to proceed with training without setting those empty lambdas.

When i removed these lines, I got the following error:


File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 479, in _disallow_in_graph_mode
    " this function with @tf.function.".format(task))
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function. 

mayurmahurkar avatar Dec 08 '20 13:12 mayurmahurkar

Related topics: https://github.com/tensorflow/tensorflow/issues/47309 https://github.com/tensorflow/tensorflow/issues/39702#issuecomment-631750377

kiflowb777 avatar Feb 26 '21 14:02 kiflowb777

@mayurmahurkar Add tf.compat.v1.disable_eager_execution() after import tensorflow as tf

kiflowb777 avatar Feb 26 '21 14:02 kiflowb777

There is an issue however when you remove these lines:

self.keras_model._losses = []
self.keras_model._per_input_losses = {}

They were used to prevent duplicated losses. If you remove these lines and do a multi-step training, the losses for the previous step won't be cleared and you'll end up with x2 losses.

This is OK as long as you do not need to change the learning rate. Any hint to clear losses for multi-step training ?

Thanks

lovehell avatar Mar 19 '21 13:03 lovehell

@lovehell I have this issue. Did you solve it?

Behnam72 avatar Feb 16 '22 01:02 Behnam72

@Behnam72 I didn't. However as far as I remember it does not corrupt your training, it only displays wrong losses

lovehell avatar Feb 16 '22 21:02 lovehell

@lovehell thanks for the answer. I'd appreciate it if you can answer this:

These are my losses for two epochs (each ran separately with model.train):

epoch 1/1 100/100 [==============================] - 69s 626ms/step - batch: 49.5000 - size: 8.0000 - loss: 1.1028 - rpn_class_loss: 0.0173 - rpn_bbox_loss: 0.3368 - mrcnn_class_loss: 0.2695 - mrcnn_bbox_loss: 0.2328 - mrcnn_mask_loss: 0.2465 - val_loss: 1.7118 - val_rpn_class_loss: 0.0155 - val_rpn_bbox_loss: 0.6753 - val_mrcnn_class_loss: 0.3638 - val_mrcnn_bbox_loss: 0.3188 - val_mrcnn_mask_loss: 0.3385

epoch 2/2 100/100 [==============================] - 34s 230ms/step - batch: 49.5000 - size: 8.0000 - loss: 0.4404 - rpn_class_loss: 0.0062 - rpn_bbox_loss: 0.0626 - mrcnn_class_loss: 0.0484 - mrcnn_bbox_loss: 0.0330 - mrcnn_mask_loss: 0.0699 - val_loss: 3.1889 - val_rpn_class_loss: 0.0167 - val_rpn_bbox_loss: 0.7603 - val_mrcnn_class_loss: 0.2439 - val_mrcnn_bbox_loss: 0.2668 - val_mrcnn_mask_loss: 0.3067

For the second epoch, the sum of 5 losses in both training and validation is 1/2 of the "loss" and "val_loss". Is this only because i did not empty losses? if so, then why are the 5 losses okay? because we had these two lines in TF1.x:

    self.keras_model._losses = []
    self.keras_model._per_input_losses = {}

They empty both losses and per input losses. Does this mean the per input losses are also double now?

Behnam72 avatar Feb 16 '22 22:02 Behnam72

https://github.com/matterport/Mask_RCNN/pull/1896/files#diff-312c7e001d14bbb7ce5f8978f7b04cc3R2171

Hey,

I am facing the same error ,could you tell me how did you solve it?

SindhuKodali avatar Jun 17 '22 09:06 SindhuKodali

I think I found a solution or workaround... change

# First, clear previously set losses to avoid duplication
self.keras_model._losses = []
self.keras_model._per_input_losses = {}

to

# First, clear previously set losses to avoid duplication
try:
    self.keras_model._losses.clear()
except AttributeError:
    pass
try:
    self.keras_model._per_input_losses.clear()
except AttributeError:
    pass

and also change a few lines afterwards from:

for name in loss_names:
    layer = self.keras_model.get_layer(name)
    if layer.output in self.keras_model.losses:
        continue
    loss = (
        tf.reduce_mean(layer.output, keepdims=True)
        * self.config.LOSS_WEIGHTS.get(name, 1.))
    self.keras_model.add_loss(loss)

to

existing_layer_names = []
for name in loss_names:
    layer = self.keras_model.get_layer(name)
    if layer is None or name in existing_layer_names:
        continue
    existing_layer_names.append(name)
    loss = (tf.reduce_mean(layer.output, keepdims=True)
            * self.config.LOSS_WEIGHTS.get(name, 1.))
    self.keras_model.add_loss(loss)

as well as self.keras_model.metrics_tensors.append(loss) to self.keras_model.add_metric(loss, name=name, aggregation='mean')

trueToastedCode avatar Sep 06 '22 10:09 trueToastedCode

as well as self.keras_model.metrics_tensors.append(loss) to self.keras_model.add_metric(loss, name=name, aggregation='mean')

After making this change, I get an error of "unhashable type : ListWrapper". Not sure how to proceed after this.

sampath9875 avatar Nov 14 '22 04:11 sampath9875

@sampath9875 Did you end up finding a working solution?

WesYarber avatar Dec 09 '22 22:12 WesYarber

@sampath9875 Did you end up finding a working solution?

No. I decided to try Detectron's version of Mask RCNN.

sampath9875 avatar Dec 10 '22 05:12 sampath9875

@sampath9875 Did you end up finding a working solution?

No. I decided to try Detectron's version of Mask RCNN.

Does this work on Apple Silicon?

trueToastedCode avatar Dec 11 '22 10:12 trueToastedCode

I believe it should provided all the required packages are installed. Detectron2 is built on PyTorch. And it also requires a whole list of additional packages.

sampath9875 avatar Dec 12 '22 03:12 sampath9875

I think the offending lines might be where these protected variables of keras_model are accessed directly:

self.keras_model._losses = []
self.keras_model._per_input_losses = {}

Removing those allowed me to proceed with training without setting those empty lambdas.

It works perfectly. Thanks a lot!

gbinduo avatar Nov 09 '23 18:11 gbinduo