shape mismatch error using resnet
Bug description
When running resnet with certain settings, get a mismatch of sizes error
Expected behaviour
Actual behaviour
Your personal set up
- OS: Windows 10 pro
- Version(s):
- SLEAP installation method (listed here):
- [x] Conda from package
- [ ] Conda from source
- [ ] pip package
- [ ] Apple Silicon Macs
Environment packages
# paste output of `pip freeze` or `conda list` here
packages in environment at C:\Users\ONS\anaconda3\envs\sleap:
Name Version Build Channel
absl-py 0.15.0 pypi_0 pypi aom 3.5.0 h63175ca_0 conda-forge astunparse 1.6.3 pypi_0 pypi attrs 21.2.0 pypi_0 pypi backports-zoneinfo 0.2.1 pypi_0 pypi bzip2 1.0.8 he774522_0 ca-certificates 2023.05.30 haa95532_0 cached-property 1.5.2 py_0 cachetools 4.2.4 pypi_0 pypi cattrs 1.1.1 pypi_0 pypi certifi 2021.10.8 pypi_0 pypi charset-normalizer 2.0.12 pypi_0 pypi clang 5.0 pypi_0 pypi colorama 0.4.6 pypi_0 pypi commonmark 0.9.1 pypi_0 pypi cuda-nvcc 11.3.58 hb8d16a4_0 nvidia cudatoolkit 11.3.1 h59b6b97_2 cudnn 8.2.1 cuda11.3_0 cycler 0.11.0 pypi_0 pypi dav1d 1.2.1 h2bbff1b_0 efficientnet 1.0.0 pypi_0 pypi expat 2.5.0 h63175ca_1 conda-forge ffmpeg 5.1.2 gpl_he426399_111 conda-forge flatbuffers 1.12 pypi_0 pypi font-ttf-dejavu-sans-mono 2.37 hd3eb1b0_0 font-ttf-inconsolata 2.001 hcb22688_0 font-ttf-source-code-pro 2.030 hd3eb1b0_0 font-ttf-ubuntu 0.83 h8b1ccd4_0 fontconfig 2.14.2 hbde0cde_0 conda-forge fonts-anaconda 1 h8fa9717_0 fonts-conda-ecosystem 1 hd3eb1b0_0 fonttools 4.38.0 pypi_0 pypi freetype 2.12.1 ha860e81_0 gast 0.4.0 pypi_0 pypi geos 3.9.1 h6c2663c_0 google-auth 1.35.0 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi grpcio 1.44.0 pypi_0 pypi h5py 3.1.0 nompi_py37h19fda09_100 conda-forge hdf5 1.10.6 h1756f20_1 hdmf 3.5.2 pypi_0 pypi icc_rt 2022.1.0 h6049295_2 idna 3.3 pypi_0 pypi image-classifiers 1.0.0 pypi_0 pypi imageio 2.15.0 pypi_0 pypi imgaug 0.4.0 pypi_0 pypi imgstore 0.2.9 pypi_0 pypi importlib-metadata 4.11.1 pypi_0 pypi importlib-resources 5.12.0 pypi_0 pypi intel-openmp 2023.1.0 h59b6b97_46319 joblib 1.2.0 pypi_0 pypi jpeg 9e h2bbff1b_1 jsmin 3.0.1 pypi_0 pypi jsonpickle 1.2 pypi_0 pypi jsonschema 4.17.3 pypi_0 pypi keras 2.6.0 pypi_0 pypi keras-applications 1.0.8 pypi_0 pypi keras-preprocessing 1.1.2 pypi_0 pypi kiwisolver 1.4.4 pypi_0 pypi lcms2 2.12 h83e58a3_0 lerc 3.0 hd77b12b_0 libblas 3.9.0 17_win64_mkl conda-forge libcblas 3.9.0 17_win64_mkl conda-forge libdeflate 1.10 h8ffe710_0 conda-forge libexpat 2.5.0 h63175ca_1 conda-forge libiconv 1.17 h8ffe710_0 conda-forge liblapack 3.9.0 17_win64_mkl conda-forge libopus 1.3.1 h8ffe710_1 conda-forge libpng 1.6.39 h8cc25b3_0 libtiff 4.3.0 hc4061b1_4 conda-forge libxml2 2.11.4 hc3477c8_0 conda-forge libzlib 1.2.13 hcfcfb64_5 conda-forge m2w64-gcc-libgfortran 5.3.0 6 conda-forge m2w64-gcc-libs 5.3.0 7 conda-forge m2w64-gcc-libs-core 5.3.0 7 conda-forge m2w64-gmp 6.1.0 2 conda-forge m2w64-libwinpthread-git 5.0.0.4634.697f757 2 conda-forge markdown 3.3.6 pypi_0 pypi matplotlib 3.5.3 pypi_0 pypi mkl 2022.1.0 h6a75c08_874 conda-forge msys2-conda-epoch 20160418 1 conda-forge ndx-pose 0.1.1 pypi_0 pypi networkx 2.6.3 pypi_0 pypi nixio 1.5.3 pypi_0 pypi numpy 1.19.5 py37h4c2b6ed_3 conda-forge oauthlib 3.2.0 pypi_0 pypi olefile 0.46 py37_0 opencv-python 4.5.5.62 pypi_0 pypi opencv-python-headless 4.5.5.62 pypi_0 pypi openh264 2.3.1 h63175ca_2 conda-forge openjpeg 2.4.0 h4fc8c34_0 openssl 3.0.9 h2bbff1b_0 opt-einsum 3.3.0 pypi_0 pypi packaging 21.3 pyhd3eb1b0_0 pandas 1.3.5 py37h9386db6_0 conda-forge pillow 8.4.0 py37hd7d9ad0_0 conda-forge pip 23.1.2 pyhd8ed1ab_0 conda-forge pkgutil-resolve-name 1.3.10 pypi_0 pypi protobuf 4.22.1 pypi_0 pypi psutil 5.9.4 pypi_0 pypi pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pygments 2.14.0 pypi_0 pypi pykalman 0.9.5 pypi_0 pypi pynwb 2.3.1 pypi_0 pypi pyparsing 3.0.7 pypi_0 pypi pyreadline 2.1 py37_1 pyrsistent 0.19.3 pypi_0 pypi pyside2 5.14.1 pypi_0 pypi python 3.7.12 h900ac77_100_cpython conda-forge python-dateutil 2.8.2 pyhd3eb1b0_0 python-rapidjson 1.10 pypi_0 pypi python_abi 3.7 3_cp37m conda-forge pytz 2022.7 py37haa95532_0 pytz-deprecation-shim 0.1.0.post0 pypi_0 pypi pywavelets 1.3.0 pypi_0 pypi pyzmq 25.0.2 pypi_0 pypi qimage2ndarray 1.9.0 pypi_0 pypi qtpy 2.2.0 py37haa95532_0 requests 2.27.1 pypi_0 pypi requests-oauthlib 1.3.1 pypi_0 pypi rich 10.16.1 pypi_0 pypi ruamel-yaml 0.17.21 pypi_0 pypi ruamel-yaml-clib 0.2.7 pypi_0 pypi scikit-image 0.19.3 pypi_0 pypi scikit-learn 1.0.2 pypi_0 pypi scikit-video 1.1.11 pypi_0 pypi scipy 1.7.3 py37hb6553fb_0 conda-forge seaborn 0.12.2 pypi_0 pypi segmentation-models 1.0.1 pypi_0 pypi setuptools 59.8.0 py37h03978a9_1 conda-forge setuptools-scm 6.3.2 pypi_0 pypi shapely 1.7.1 py37hc520ffa_5 conda-forge shiboken2 5.14.1 pypi_0 pypi six 1.15.0 py37haa95532_0 sleap 1.3.0 pypi_0 pypi sqlite 3.41.2 h2bbff1b_0 svt-av1 1.4.1 h63175ca_0 conda-forge tbb 2021.8.0 h59b6b97_0 tensorboard 2.6.0 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi tensorflow 2.6.3 pypi_0 pypi tensorflow-estimator 2.6.0 pypi_0 pypi tensorflow-hub 0.13.0 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi threadpoolctl 3.1.0 pypi_0 pypi tifffile 2021.11.2 pypi_0 pypi tk 8.6.12 h2bbff1b_0 tomli 2.0.1 pypi_0 pypi typing-extensions 3.10.0.2 pypi_0 pypi tzdata 2022.7 pypi_0 pypi tzlocal 4.3 pypi_0 pypi ucrt 10.0.20348.0 haa95532_0 urllib3 1.26.8 pypi_0 pypi vc 14.2 h21ff451_1 vc14_runtime 14.34.31931 h5081d32_16 conda-forge vs2015_runtime 14.34.31931 hed1258a_16 conda-forge werkzeug 2.0.3 pypi_0 pypi wheel 0.38.4 py37haa95532_0 wrapt 1.12.1 pypi_0 pypi x264 1!164.3095 h8ffe710_2 conda-forge x265 3.5 h2d74725_3 conda-forge xz 5.2.6 h8d14728_0 conda-forge zipp 3.7.0 pypi_0 pypi zlib 1.2.13 hcfcfb64_5 conda-forge zstd 1.5.2 h12be248_6 conda-forge
Logs
# paste relevant logs here, if any
INFO:sleap.nn.training: INFO:sleap.nn.training:Auto-selected GPU 0 with 16183 MiB of free memory. INFO:sleap.nn.training:Using GPU 0 for acceleration. INFO:sleap.nn.training:Disabled GPU memory pre-allocation. INFO:sleap.nn.training:System: GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True INFO:sleap.nn.training: INFO:sleap.nn.training:Initializing trainer... INFO:sleap.nn.training:Loading training labels from: Z:/KuhnU/Miles-Kuhn/SLEAP/NewModel2.slp INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1 INFO:sleap.nn.training: Splits: Training = 1416 / Validation = 157. INFO:sleap.nn.training:Setting up for training... INFO:sleap.nn.training:Setting up pipeline builders... INFO:sleap.nn.training:Setting up model... INFO:sleap.nn.training:Building test pipeline... 2024-05-14 20:16:14.066762: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-14 20:16:16.252042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13599 MB memory: -> device: 0, name: NVIDIA RTX A4000, pci bus id: 0000:01:00.0, compute capability: 8.6 2024-05-14 20:16:19.767318: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) INFO:sleap.nn.training:Loaded test example. [273.244s] INFO:sleap.nn.training: Input shape: (432, 576, 1) INFO:sleap.nn.training:Created Keras model. INFO:sleap.nn.training: Backbone: ResNet152(upsampling_stack=UpsamplingStack(output_stride=2, upsampling_stride=2, transposed_conv=False, transposed_conv_filters=64, transposed_conv_filters_rate=1.0, transposed_conv_kernel_size=4, transposed_conv_batchnorm=True, make_skip_connection=False, skip_add=False, refine_convs=2, refine_convs_filters=64, refine_convs_filters_rate=1.0, refine_convs_batchnorm=True), features_output_stride=32, pretrained=True, frozen=True, skip_connections=False, model_name='resnet152', stack_configs=[{'filters': 64, 'blocks': 3, 'stride1': 1, 'name': 'conv2', 'dilation_rate': 1}, {'filters': 128, 'blocks': 8, 'stride1': 2, 'name': 'conv3', 'dilation_rate': 1}, {'filters': 256, 'blocks': 36, 'stride1': 2, 'name': 'conv4', 'dilation_rate': 1}, {'filters': 512, 'blocks': 3, 'stride1': 2, 'name': 'conv5', 'dilation_rate': 1}]) INFO:sleap.nn.training: Max stride: 32 INFO:sleap.nn.training: Parameters: 59,811,915 INFO:sleap.nn.training: Heads: INFO:sleap.nn.training: [0] = SingleInstanceConfmapsHead(part_names=['eyelid top', 'eyelid bottom', 'nose right', 'nose left', 'spout', 'mouth lip top', 'mouth corner', 'paw right', 'paw left', 'tongue', 'mouth lip bottom'], sigma=2.5, output_stride=2, loss_weight=1.0) INFO:sleap.nn.training: Outputs: INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 224, 288, 11), dtype=tf.float32, name=None), name='SingleInstanceConfmapsHead/BiasAdd:0', description="created by layer 'SingleInstanceConfmapsHead'") INFO:sleap.nn.training:Training from scratch INFO:sleap.nn.training:Setting up data pipelines... INFO:sleap.nn.training:Training set: n = 1416 INFO:sleap.nn.training:Validation set: n = 157 INFO:sleap.nn.training:Setting up optimization... INFO:sleap.nn.training: OHKM enabled: HardKeypointMiningConfig(online_mining=True, hard_to_easy_ratio=2.0, min_hard_keypoints=2, max_hard_keypoints=None, loss_scale=5.0) INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=15) INFO:sleap.nn.training:Setting up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp://127.0.0.1:9000 (topic: ) INFO:sleap.nn.training: ZMQ controller subcribed to: tcp://127.0.0.1:9000 INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp://127.0.0.1:9001 for: not_set INFO:sleap.nn.training: ZMQ progress reporter publish on: tcp://127.0.0.1:9001 INFO:sleap.nn.training:Created run path: Z:/KuhnU/Miles-Kuhn/SLEAP\models\240514_201219.single_instance.n=1573 INFO:sleap.nn.training:Setting up visualization... C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\sleap\nn\inference.py:1177: UserWarning: Model input of shape (None, 432, 576, 1) does not divide evenly with output of shape (None, 224, 288, 11). f"Model input of shape {model.inputs[input_ind].shape} does not divide " INFO:sleap.nn.training:Finished trainer set up. [314.1s] INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation... INFO:sleap.nn.training:Finished creating training datasets. [5796.8s] INFO:sleap.nn.training:Starting training loop... Epoch 1/500 Traceback (most recent call last): File "C:\Users\ONS\anaconda3\envs\sleap\Scripts\sleap-train-script.py", line 33, in
sys.exit(load_entry_point('sleap==1.3.0', 'console_scripts', 'sleap-train')()) File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 2014, in main trainer.train() File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py", line 943, in train verbose=2, File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\keras\engine\training.py", line 1184, in fit tmp_logs = self.train_function(iterator) File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\eager\def_function.py", line 885, in call result = self._call(*args, **kwds) File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\eager\def_function.py", line 933, in _call self._initialize(args, kwds, add_initializers_to=initializers) File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\eager\def_function.py", line 760, in _initialize *args, **kwds)) File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\eager\function.py", line 3066, in _get_concrete_function_internal_garbage_collected graph_function, _ = self._maybe_define_function(args, kwargs) File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\eager\function.py", line 3463, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs) File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\eager\function.py", line 3308, in _create_graph_function capture_by_value=self._capture_by_value), File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\func_graph.py", line 1007, in func_graph_from_py_func func_outputs = python_func(*func_args, **func_kwargs) File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\eager\def_function.py", line 668, in wrapped_fn out = weak_wrapped_fn().wrapped(*args, **kwds) File "C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\func_graph.py", line 994, in wrapper raise e.ag_error_metadata.to_exception(e) ValueError: in user code: C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\keras\engine\training.py:853 train_function * return step_function(self, iterator) C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\sleap\nn\training.py:303 loss_fn * loss += loss_fn(y_gt, y_pr) C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\keras\losses.py:141 __call__ ** losses = call_fn(y_true, y_pred) C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\keras\losses.py:245 call ** return ag_fn(y_true, y_pred, **self._fn_kwargs) C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\util\dispatch.py:206 wrapper return target(*args, **kwargs) C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\keras\losses.py:1204 mean_squared_error return backend.mean(tf.math.squared_difference(y_pred, y_true), axis=-1) C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\ops\gen_math_ops.py:10514 squared_difference "SquaredDifference", x=x, y=y, name=name) C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\op_def_library.py:750 _apply_op_helper attrs=attr_protos, op_def=op_def) C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\func_graph.py:601 _create_op_internal compute_device) C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\ops.py:3569 _create_op_internal op_def=op_def) C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\ops.py:2042 __init__ control_input_ops, op_def) C:\Users\ONS\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\framework\ops.py:1883 _create_c_op raise ValueError(str(e)) ValueError: Dimensions must be equal, but are 224 and 216 for '{{node loss_fn/mean_squared_error/SquaredDifference}} = SquaredDifference[T=DT_FLOAT](model/SingleInstanceConfmapsHead/BiasAdd, IteratorGetNext:1)' with input shapes: [15,224,288,11], [15,216,288,?].INFO:sleap.nn.callbacks:Closing the reporter controller/context. INFO:sleap.nn.callbacks:Closing the training controller socket/context. Run Path: Z:/KuhnU/Miles-Kuhn/SLEAP\models\240514_201219.single_instance.n=1573
Screenshots
How to reproduce
- Go to '...'
- Click on '....'
- Scroll down to '....'
- See error
Hi @milesOIST,
Would you mind uploading a sleap package with your training data here so I can try replicating your issue?
Also, you mentioned certain settings. Which settings did you notice this error happening with?
Thanks!
Elizabeth
This could be related to #1768.