ValueError: The initial value's shape is not compatible with the explicitly supplied `shape` argument
Hi,
I am trying to do semantic segmentation using the Panoptic-Deeplab example (https://github.com/google-research/deeplab2/blob/main/g3doc/projects/panoptic_deeplab.md) and setting this to false in the config file:
instance {
enable: false
}
See the proto file (as txt to upload it here): resnet50_os16_semantic.txt, which is basically this.
I also downloaded the checkpoint resnet50_os16_panoptic_deeplab_coco_train.tar.gz, which I added to the proto file after the untar.
I would like also to attach my training annotations, as .txt to be able to upload it here, but it's actually .json.
I am running everything on a Jupyter Notebook environment from AWS Sagemaker, with a GPU.
I obtain the following error:
I0826 12:24:02.561120 140623071049536 api.py:446] Eval scale 1.0; setting pooling size to [7, 7]
Traceback (most recent call last):
File "deeplab2/trainer/train.py", line 76, in <module>
app.run(main)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "deeplab2/trainer/train.py", line 72, in main
FLAGS.num_gpus)
File "/home/ec2-user/SageMaker/dish_segmentation/deeplab2/trainer/train_lib.py", line 201, in run_experiment
build_deeplab_model(deeplab_model, crop_size)
File "/home/ec2-user/SageMaker/dish_segmentation/deeplab2/trainer/train_lib.py", line 80, in build_deeplab_model
tf.keras.Input(input_shape, batch_size=batch_size), training=False)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/engine/base_layer.py", line 977, in __call__
input_list)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/engine/base_layer.py", line 1115, in _functional_construction_call
inputs, input_masks, args, kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/engine/base_layer.py", line 848, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/engine/base_layer.py", line 888, in _infer_output_signature
outputs = call_fn(inputs, *args, **kwargs)
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 695, in wrapper
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:
/home/ec2-user/SageMaker/dish_segmentation/deeplab2/model/deeplab.py:155 call *
pred_dict = self._decoder(
/home/ec2-user/SageMaker/dish_segmentation/deeplab2/model/encoder/axial_resnet.py:764 call *
current_output, activated_output, memory_feature, endpoints = (
/home/ec2-user/SageMaker/dish_segmentation/deeplab2/model/encoder/axial_resnet.py:551 call_encoder_before_stacked_decoder *
current_output = self._stem(inputs)
/home/ec2-user/SageMaker/dish_segmentation/deeplab2/model/layers/convolutions.py:287 call *
x = self._conv(x)
/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/engine/base_layer.py:1030 __call__ **
self._maybe_build(inputs)
/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/engine/base_layer.py:2659 _maybe_build
self.build(input_shapes) # pylint:disable=not-callable
/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/layers/convolutional.py:204 build
dtype=self.dtype)
/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/engine/base_layer.py:663 add_weight
caching_device=caching_device)
/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py:818 _add_variable_with_custom_getter
**kwargs_for_getter)
/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/engine/base_layer_utils.py:129 make_variable
shape=variable_shape if variable_shape else None)
/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:266 __call__
return cls._variable_v1_call(*args, **kwargs)
/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:227 _variable_v1_call
shape=shape)
/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:67 getter
return captured_getter(captured_previous, **kwargs)
/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:2127 creator_with_resource_vars
created = self._create_variable(next_creator, **kwargs)
/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/distribute/one_device_strategy.py:278 _create_variable
return next_creator(**kwargs)
/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:205 <lambda>
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py:2626 default_variable_creator
shape=shape)
/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:270 __call__
return super(VariableMetaclass, cls).__call__(*args, **kwargs)
/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py:1613 __init__
distribute_strategy=distribute_strategy)
/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py:1753 _init_from_args
(initial_value.shape, shape))
ValueError: The initial value's shape ((7, 7, 3, 64)) is not compatible with the explicitly supplied `shape` argument ((7, 7, 100, 64)).
Note I set the crop size to 100. So after changing to 3 the crop size, I get:
ValueError: The initial value's shape ((64,)) is not compatible with the explicitly supplied `shape` argument ((2,)).
I guess because it's iterating over a new image.
I run the training this way:
python deeplab2/trainer/train.py \
--config_file=/home/ec2-user/SageMaker/dish_segmentation/config_files/resnet50_os16_semantic.textproto \
--mode=train \
--model_dir=/home/rcruz/PycharmProjects/dish_segmentation/model \
--num_gpus=1
Could you please help me with any clue.
I am beginner on segmentation models, so I might be making incorrect assumptions.
Thanks a lot!
Hi @rcruzgar,
Thanks for the issue. However, it goes beyond our scope to help you debug. We would suggest you run our provided tutorials (e.g., Cityscapes).
Cheers,
Hi!) Got the same issue with semantic segmentation. @rcruzgar, may be you've already solved it?
Thanks a lot!
Hello,
Thanks for reporting the issue. Unfortunately, if you want to train a semantic-only model, you could not use the trained panoptic checkpoints for initialization (as shown in the error log that the job fails to load the trained checkpoint). You need to train a new one by yourself.
Cheers,
It means that there are no pretrain models for semantic only in deeplab2 repo, right?
Hi, I attached restore_semantic_last_layer_from_initial_checkpoint : false with a textproto file like model_options { initial_checkpoint: path-to-pretrained-model (for me it was max_deeplab_l_backbone_os16_axial_deeplab_cityscapes_trainfine/ckpt-60000) restore_semantic_last_layer_from_initial_checkpoint: false
... } then it worked for my own semantic only dataset.