sleap copied to clipboard
Centered instance model scales input image (not cropped image) leading to error
I think the problem is that we generally expect an input scaling of 1.0 for centered instance models since they're crops already. The training does handle this appropriately, but not the visualization for some reason (it's probably missing the input scaling transformer/preprocessing).
In general, I think we can solve this by switching to using the InferenceModel
classes to generate visualizations so that we're not doing some custom inference routines inside of Trainer
Here's the relevant error:
File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\", line 280, in on_epoch_end
figure = self.plot_fn()
File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\", line 1328, in <lambda>
viz_fn=lambda: visualize_example(next(training_viz_ds_iter)),
File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\", line 1308, in visualize_example
preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0))
File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\", line 1037, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\", line 1722, in call
out = self.keras_model(crops)
File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\", line 1020, in __call__
input_spec.assert_input_compatibility(self.input_spec, inputs,
File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\", line 269, in assert_input_compatibility
', found shape=' + display_shape(x.shape))
ValueError: Input 0 is incompatible with layer model: expected shape=(None, 128, 128, 3), found shape=(1, 32, 32, 3)
See issue below for more.
Discussed in
Originally posted by Shifulai July 29, 2022 Thank for your attention. When I try to train the top-down centered instance model, the training cannot work when the input scaling is not 1.0. The train will stay at epoch1 but the runtime still add.
Bug report below
Software versions:
SLEAP: 1.2.6
TensorFlow: 2.6.3
Numpy: 1.19.5
Python: 3.7.12
OS: Windows-10-10.0.19041-SP0
Happy SLEAPing! :)
Using already trained model for centroid: D:/Desktop/CK/sleap/data\models\220729_134535.centroid.n=765\training_config.json
Resetting monitor window.
Polling: D:/Desktop/CK/sleap/data\models\220729_194813.centered_instance.n=765\viz\validation.*.png
Start training centered_instance...
['sleap-train', 'C:\\Users\\admin\\AppData\\Local\\Temp\\tmp1aqtnvzl\\220729_194813_training_job.json', 'D:/Desktop/CK/sleap/data/food competition.slp', '--zmq', '--save_viz']
SLEAP: 1.2.6
TensorFlow: 2.6.3
Numpy: 1.19.5
Python: 3.7.12
OS: Windows-10-10.0.19041-SP0 labels file: D:/Desktop/CK/sleap/data/food competition.slp profile: C:\Users\admin\AppData\Local\Temp\tmp1aqtnvzl\220729_194813_training_job.json{
"training_job_path": "C:\\Users\\admin\\AppData\\Local\\Temp\\tmp1aqtnvzl\\220729_194813_training_job.json",
"labels_path": "D:/Desktop/CK/sleap/data/food competition.slp",
"video_paths": [
"val_labels": null,
"test_labels": null,
"tensorboard": false,
"save_viz": true,
"zmq": true,
"run_name": "",
"prefix": "",
"suffix": "",
"cpu": false,
"first_gpu": false,
"last_gpu": false,
"gpu": 0
} job:{
"data": {
"labels": {
"training_labels": null,
"validation_labels": null,
"validation_fraction": 0.1,
"test_labels": null,
"split_by_inds": false,
"training_inds": null,
"validation_inds": null,
"test_inds": null,
"search_path_hints": [],
"skeletons": []
"preprocessing": {
"ensure_rgb": false,
"ensure_grayscale": false,
"imagenet_mode": null,
"input_scaling": 0.25,
"pad_to_stride": null,
"resize_and_pad_to_target": true,
"target_height": null,
"target_width": null
"instance_cropping": {
"center_on_part": "tail",
"crop_size": null,
"crop_size_detection_padding": 16
"model": {
"backbone": {
"leap": null,
"unet": {
"stem_stride": null,
"max_stride": 16,
"output_stride": 8,
"filters": 16,
"filters_rate": 1.5,
"middle_block": true,
"up_interpolate": true,
"stacks": 1
"hourglass": null,
"resnet": null,
"pretrained_encoder": null
"heads": {
"single_instance": null,
"centroid": null,
"centered_instance": {
"anchor_part": "tail",
"part_names": null,
"sigma": 2.5,
"output_stride": 8,
"loss_weight": 1.0,
"offset_refinement": false
"multi_instance": null,
"multi_class_bottomup": null,
"multi_class_topdown": null
"optimization": {
"preload_data": true,
"augmentation_config": {
"rotate": true,
"rotation_min_angle": -180.0,
"rotation_max_angle": 180.0,
"translate": false,
"translate_min": -5,
"translate_max": 5,
"scale": false,
"scale_min": 0.9,
"scale_max": 1.1,
"uniform_noise": false,
"uniform_noise_min_val": 0.0,
"uniform_noise_max_val": 10.0,
"gaussian_noise": false,
"gaussian_noise_mean": 5.0,
"gaussian_noise_stddev": 1.0,
"contrast": true,
"contrast_min_gamma": 0.5,
"contrast_max_gamma": 2.0,
"brightness": true,
"brightness_min_val": 0.0,
"brightness_max_val": 10.0,
"random_crop": false,
"random_crop_height": 256,
"random_crop_width": 256,
"random_flip": false,
"flip_horizontal": true
"online_shuffling": true,
"shuffle_buffer_size": 128,
"prefetch": true,
"batch_size": 4,
"batches_per_epoch": null,
"min_batches_per_epoch": 200,
"val_batches_per_epoch": null,
"min_val_batches_per_epoch": 10,
"epochs": 200,
"optimizer": "adam",
"initial_learning_rate": 0.0001,
"learning_rate_schedule": {
"reduce_on_plateau": true,
"reduction_factor": 0.5,
"plateau_min_delta": 1e-06,
"plateau_patience": 5,
"plateau_cooldown": 3,
"min_learning_rate": 1e-08
"hard_keypoint_mining": {
"online_mining": false,
"hard_to_easy_ratio": 2.0,
"min_hard_keypoints": 2,
"max_hard_keypoints": null,
"loss_scale": 5.0
"early_stopping": {
"stop_training_on_plateau": true,
"plateau_min_delta": 1e-08,
"plateau_patience": 20
"outputs": {
"save_outputs": true,
"run_name": "220729_194813.centered_instance.n=765",
"run_name_prefix": "",
"run_name_suffix": "",
"runs_folder": "D:/Desktop/CK/sleap/data\\models",
"tags": [
"save_visualizations": true,
"delete_viz_images": true,
"zip_outputs": false,
"log_to_csv": true,
"checkpointing": {
"initial_model": false,
"best_model": true,
"every_epoch": false,
"latest_model": false,
"final_model": false
"tensorboard": {
"write_logs": false,
"loss_frequency": "epoch",
"architecture_graph": false,
"profile_graph": false,
"visualizations": true
"zmq": {
"subscribe_to_controller": true,
"controller_address": "tcp://",
"controller_polling_timeout": 10,
"publish_updates": true,
"publish_address": "tcp://"
"name": "",
"description": "",
"sleap_version": "1.2.6",
"filename": "C:\\Users\\admin\\AppData\\Local\\Temp\\tmp1aqtnvzl\\220729_194813_training_job.json"
} GPU 0 for acceleration. GPU memory pre-allocation.
GPUs: 1/1 available
Device: /physical_device:GPU:0
Available: True
Initalized: False
Memory growth: True trainer... training labels from: D:/Desktop/CK/sleap/data/food competition.slp training and validation splits from validation fraction: 0.1 Splits: Training = 689 / Validation = 76. up for training... up pipeline builders... up model... test pipeline...
2022-07-29 19:48:31.149710: I tensorflow/core/platform/] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-29 19:48:33.014129: I tensorflow/core/common_runtime/gpu/] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3489 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
2022-07-29 19:48:35.801343: I tensorflow/compiler/mlir/] None of the MLIR Optimization Passes are enabled (registered 2)
2022-07-29 19:48:47.065434: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 480 } dim { size: 270 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } } test example. [22.266s] Input shape: (128, 128, 3) Keras model. Backbone: UNet(stacks=1, filters=16, filters_rate=1.5, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=1, up_interpolate=True, block_contraction=False) Max stride: 16 Parameters: 265,575 Heads: [0] = CenteredInstanceConfmapsHead(part_names=['nose', 'hear_r', 'hear_l', 'tail'], anchor_part='tail', sigma=2.5, output_stride=8, loss_weight=1.0) Outputs: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 16, 16, 4), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'") up data pipelines... set: n = 689 set: n = 76 up optimization... Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=20) up outputs...
INFO:sleap.nn.callbacks:Training controller subscribed to: tcp:// (topic: ) ZMQ controller subcribed to: tcp://
INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp:// for: not_set ZMQ progress reporter publish on: tcp:// run path: D:/Desktop/CK/sleap/data\models\220729_194813.centered_instance.n=765 up visualization...
2022-07-29 19:48:59.507634: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1920 } dim { size: 1080 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 }
dim { size: 128 } dim { size: 128 } dim { size: 3 } } }
2022-07-29 19:49:07.684222: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1920 } dim { size: 1080 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 }
dim { size: 128 } dim { size: 128 } dim { size: 3 } } } trainer set up. [41.8s] for training data generation...
2022-07-29 19:55:14.233259: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 480 } dim { size: 270 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } }
2022-07-29 19:55:31.551806: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 480 } dim { size: 270 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } } creating training datasets. [384.0s] training loop...
2022-07-29 19:55:32.101723: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 480 } dim { size: 270 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } }
Epoch 1/200
2022-07-29 19:55:33.962928: I tensorflow/stream_executor/cuda/] Loaded cuDNN version 8201
WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0000s vs `on_train_batch_end` time: 0.0156s). Check your callbacks.
2022-07-29 19:55:58.738032: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 480 } dim { size: 270 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2304 num_cores: 16 environment { key: "cpu_instruction_set" value: "SSE, SSE2" } environment { key: "eigen" value: "3.3.90" } l1_cache_size: 32768 l2_cache_size: 262144 l3_cache_size: 16777216 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } }
344/344 - 30s - loss: 0.0325 - nose: 0.0373 - hear_r: 0.0405 - hear_l: 0.0394 - tail: 0.0128 - val_loss: 0.0242 - val_nose: 0.0294 - val_hear_r: 0.0325 - val_hear_l: 0.0307 - val_tail: 0.0041
Traceback (most recent call last):
File "D:\anaconda\envs\sleap\Scripts\", line 33, in <module>
sys.exit(load_entry_point('sleap==1.2.6', 'console_scripts', 'sleap-train')())
File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\", line 1955, in main
File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\", line 923, in train
File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\", line 1230, in fit
callbacks.on_epoch_end(epoch, epoch_logs)
File "D:\anaconda\envs\sleap\lib\site-packages\keras\", line 413, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\", line 280, in on_epoch_end
figure = self.plot_fn()
File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\", line 1328, in <lambda>
viz_fn=lambda: visualize_example(next(training_viz_ds_iter)),
File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\", line 1308, in visualize_example
preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0))
File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\", line 1037, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "D:\anaconda\envs\sleap\lib\site-packages\sleap\nn\", line 1722, in call
out = self.keras_model(crops)
File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\", line 1020, in __call__
input_spec.assert_input_compatibility(self.input_spec, inputs,
File "D:\anaconda\envs\sleap\lib\site-packages\keras\engine\", line 269, in assert_input_compatibility
', found shape=' + display_shape(x.shape))
ValueError: Input 0 is incompatible with layer model: expected shape=(None, 128, 128, 3), found shape=(1, 32, 32, 3)
INFO:sleap.nn.callbacks:Closing the reporter controller/context.
INFO:sleap.nn.callbacks:Closing the training controller socket/context.
Can we just use the output stride of the centroid model to do a limited version of input scaling on the centered instance model (finite set of feasible stride values)? -> This would couple the centroid and centered instance models though, which we might not want.
Problem Analysis
It seems that the keras_model
used expects an input shape same as the pre-scaled input. We should initialize the keras model to expect the scaled input shape.
Relevant Code
Set up the keras model
Make the pipeline using the preprocessing from
. Note that theResizer
is resizing the original uncropped image. To resize the cropped image, we should move theResizer
after theInstanceCropper
transform. -
The visualization pipeline for TopDown should use
instead of rewriting everything (also the base pipeline includes theResizer
Follow-up problems
- After moving the
after theInstanceCropper
, we also need a way of passing in thepoints_keys
. - We need to make some changes to #841 s.t. the
must be divisible by the max stride for the TopDown (centered instance).
Traceback (most recent call last):
File "C:\Users\TalmoLab\miniconda3\envs\sleap_convert-naming\Scripts\", line 33, in <module>
sys.exit(load_entry_point('sleap', 'console_scripts', 'sleap-train')())
File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\", line 1955, in main trainer.train()
File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\", line 923, in train verbose=2,
File "C:\Users\TalmoLab\miniconda3\envs\sleap_convert-naming\lib\site-packages\keras\utils\", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\", line 280, in on_epoch_end
figure = self.plot_fn()
File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\", line 1328, in <lambda>
viz_fn=lambda: visualize_example(next(training_viz_ds_iter)),
File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\", line 1308, in visualize_example
preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0))
File "d:\social-leap-estimates-animal-poses\pull-requests\sleap_convert-naming\sleap\sleap\nn\", line 1723, in call
out = self.keras_model(crops)
ValueError: Exception encountered when calling layer "find_instance_peaks" (type FindInstancePeaks).
Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 224, 224, 1), found shape=(1, 168, 168, 1)
Call arguments received:
• inputs=tf.Tensor(shape=(1, 224, 224, 1), dtype=float32)
I too think I am now experiencing this issue; however, I am not sure why it is coming up for me now during training when I have been training on the same model for weeks. I have tried reinstalling SLEAP v1.2.9 and paying Google Colab for more compute capability (per discussion #871). Below is the dialogue, which appears similar to what @talmo posted, but the TF error comes up at the test pipeline...
before the visualization set up. Note that I also disabled visualizations from the Run Training dialogue, yet the issue still came up:
INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
SLEAP: 1.2.9
TensorFlow: 2.8.4
Numpy: 1.21.6
Python: 3.8.10
OS: Linux-5.10.147+-x86_64-with-glibc2.29 labels file: resolved_skeletons_with_predictions.pkg.slp profile: centered_instance.json{
"training_job_path": "centered_instance.json",
"labels_path": "resolved_skeletons_with_predictions.pkg.slp",
"video_paths": [
"val_labels": null,
"test_labels": null,
"tensorboard": false,
"save_viz": false,
"zmq": false,
"run_name": "",
"prefix": "",
"suffix": "",
"cpu": false,
"first_gpu": false,
"last_gpu": false,
"gpu": "auto"
} job:{
"data": {
"labels": {
"training_labels": null,
"validation_labels": null,
"validation_fraction": 0.1,
"test_labels": null,
"split_by_inds": false,
"training_inds": null,
"validation_inds": null,
"test_inds": null,
"search_path_hints": [],
"skeletons": []
"preprocessing": {
"ensure_rgb": false,
"ensure_grayscale": false,
"imagenet_mode": null,
"input_scaling": 1.0,
"pad_to_stride": null,
"resize_and_pad_to_target": true,
"target_height": null,
"target_width": null
"instance_cropping": {
"center_on_part": "pedicel",
"crop_size": null,
"crop_size_detection_padding": 16
"model": {
"backbone": {
"leap": null,
"unet": {
"stem_stride": null,
"max_stride": 32,
"output_stride": 4,
"filters": 48,
"filters_rate": 2.0,
"middle_block": true,
"up_interpolate": true,
"stacks": 1
"hourglass": null,
"resnet": null,
"pretrained_encoder": null
"heads": {
"single_instance": null,
"centroid": null,
"centered_instance": {
"anchor_part": "pedicel",
"part_names": null,
"sigma": 2.5,
"output_stride": 4,
"loss_weight": 1.0,
"offset_refinement": false
"multi_instance": null,
"multi_class_bottomup": null,
"multi_class_topdown": null
"optimization": {
"preload_data": true,
"augmentation_config": {
"rotate": true,
"rotation_min_angle": -180.0,
"rotation_max_angle": 180.0,
"translate": false,
"translate_min": -5,
"translate_max": 5,
"scale": true,
"scale_min": 0.9,
"scale_max": 1.1,
"uniform_noise": false,
"uniform_noise_min_val": 0.0,
"uniform_noise_max_val": 10.0,
"gaussian_noise": false,
"gaussian_noise_mean": 5.0,
"gaussian_noise_stddev": 1.0,
"contrast": false,
"contrast_min_gamma": 0.5,
"contrast_max_gamma": 2.0,
"brightness": true,
"brightness_min_val": 0.0,
"brightness_max_val": 10.0,
"random_crop": false,
"random_crop_height": 256,
"random_crop_width": 256,
"random_flip": false,
"flip_horizontal": true
"online_shuffling": true,
"shuffle_buffer_size": 128,
"prefetch": true,
"batch_size": 4,
"batches_per_epoch": null,
"min_batches_per_epoch": 200,
"val_batches_per_epoch": null,
"min_val_batches_per_epoch": 10,
"epochs": 200,
"optimizer": "adam",
"initial_learning_rate": 0.0001,
"learning_rate_schedule": {
"reduce_on_plateau": true,
"reduction_factor": 0.5,
"plateau_min_delta": 1e-06,
"plateau_patience": 5,
"plateau_cooldown": 3,
"min_learning_rate": 1e-08
"hard_keypoint_mining": {
"online_mining": true,
"hard_to_easy_ratio": 2.0,
"min_hard_keypoints": 3,
"max_hard_keypoints": null,
"loss_scale": 5.0
"early_stopping": {
"stop_training_on_plateau": true,
"plateau_min_delta": 1e-08,
"plateau_patience": 10
"outputs": {
"save_outputs": true,
"run_name": "230220_173034",
"run_name_prefix": "",
"run_name_suffix": ".centered_instance",
"runs_folder": "models",
"tags": [
"save_visualizations": true,
"delete_viz_images": true,
"zip_outputs": false,
"log_to_csv": true,
"checkpointing": {
"initial_model": false,
"best_model": true,
"every_epoch": false,
"latest_model": false,
"final_model": false
"tensorboard": {
"write_logs": false,
"loss_frequency": "epoch",
"architecture_graph": false,
"profile_graph": false,
"visualizations": true
"zmq": {
"subscribe_to_controller": false,
"controller_address": "tcp://",
"controller_polling_timeout": 10,
"publish_updates": false,
"publish_address": "tcp://"
"name": "",
"description": "",
"sleap_version": "1.2.9",
"filename": "centered_instance.json"
} GPU 0 with 40533 MiB of free memory. GPU 0 for acceleration. GPU memory pre-allocation.
GPUs: 1/1 available
Device: /physical_device:GPU:0
Available: True
Initalized: False
Memory growth: True trainer... training labels from: resolved_skeletons_with_predictions.pkg.slp training and validation splits from validation fraction: 0.1 Splits: Training = 271 / Validation = 30. up for training... up pipeline builders... up model... test pipeline...
2023-02-20 23:13:00.691758: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } } test example. [3.366s] Input shape: (544, 544, 3) Keras model. Backbone: UNet(stacks=1, filters=48, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=5, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False) Max stride: 32 Parameters: 70,331,019 Heads: [0] = CenteredInstanceConfmapsHead(part_names=['prosoma', 'pedicel', 'opisthosoma', 'pedipalpR1', 'pedipalpL1', 'antlegR1', 'antlegR2', 'antlegL1', 'antlegL2', 'forelegR1', 'forelegR2', 'forelegL1', 'forelegL2', 'midlegR1', 'midlegR2', 'midlegL1', 'midlegL2', 'hindlegR1', 'hindlegR2', 'hindlegL1', 'hindlegL2', 'pedipalpR2', 'pedipalpL2', 'antlegR3', 'antlegR4', 'antlegL3', 'antlegL4'], anchor_part='pedicel', sigma=2.5, output_stride=4, loss_weight=1.0) Outputs: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 136, 136, 27), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'") up data pipelines... set: n = 271 set: n = 30 up optimization... OHKM enabled: HardKeypointMiningConfig(online_mining=True, hard_to_easy_ratio=2.0, min_hard_keypoints=3, max_hard_keypoints=None, loss_scale=5.0) Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10) up outputs... run path: models/230220_173034.centered_instance up visualization...
2023-02-20 23:13:02.432647: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
2023-02-20 23:13:03.627076: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
Unable to use Qt backend for matplotlib. This probably means Qt is running headless. trainer set up. [6.2s] for training data generation...
2023-02-20 23:13:15.674809: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
2023-02-20 23:13:18.846102: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } } creating training datasets. [15.6s] training loop...
2023-02-20 23:13:19.593550: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
Epoch 1/200
2023-02-20 23:13:56.422111: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
2023-02-20 23:14:02.495095: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 27 } dim { size: 136 } dim { size: 136 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -5 } dim { size: -6 } dim { size: 1 } } }
200/200 - 46s - loss: 0.0064 - ohkm: 0.0053 - prosoma: 0.0010 - pedicel: 0.0010 - opisthosoma: 0.0011 - pedipalpR1: 0.0010 - pedipalpL1: 0.0010 - antlegR1: 0.0011 - antlegR2: 0.0011 - antlegL1: 0.0011 - antlegL2: 0.0011 - forelegR1: 0.0010 - forelegR2: 0.0011 - forelegL1: 0.0011 - forelegL2: 0.0011 - midlegR1: 0.0010 - midlegR2: 0.0011 - midlegL1: 0.0010 - midlegL2: 0.0011 - hindlegR1: 0.0010 - hindlegR2: 0.0011 - hindlegL1: 0.0010 - hindlegL2: 0.0011 - pedipalpR2: 0.0011 - pedipalpL2: 0.0011 - antlegR3: 0.0010 - antlegR4: 9.9843e-04 - antlegL3: 0.0011 - antlegL4: 0.0010 - val_loss: 0.0063 - val_ohkm: 0.0053 - val_prosoma: 0.0010 - val_pedicel: 9.7641e-04 - val_opisthosoma: 0.0010 - val_pedipalpR1: 0.0010 - val_pedipalpL1: 0.0010 - val_antlegR1: 0.0010 - val_antlegR2: 0.0011 - val_antlegL1: 0.0010 - val_antlegL2: 0.0011 - val_forelegR1: 0.0010 - val_forelegR2: 0.0011 - val_forelegL1: 0.0010 - val_forelegL2: 0.0011 - val_midlegR1: 0.0010 - val_midlegR2: 0.0011 - val_midlegL1: 0.0010 - val_midlegL2: 0.0011 - val_hindlegR1: 0.0010 - val_hindlegR2: 0.0011 - val_hindlegL1: 0.0010 - val_hindlegL2: 0.0011 - val_pedipalpR2: 0.0010 - val_pedipalpL2: 0.0010 - val_antlegR3: 0.0011 - val_antlegR4: 9.6961e-04 - val_antlegL3: 0.0011 - val_antlegL4: 9.9553e-04 - lr: 1.0000e-04 - 46s/epoch - 232ms/step
Epoch 2/200
2023-02-20 23:14:33.058386: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
200/200 - 39s - loss: 0.0063 - ohkm: 0.0053 - prosoma: 9.8153e-04 - pedicel: 9.6689e-04 - opisthosoma: 0.0010 - pedipalpR1: 9.9786e-04 - pedipalpL1: 9.9719e-04 - antlegR1: 0.0010 - antlegR2: 0.0010 - antlegL1: 0.0010 - antlegL2: 0.0010 - forelegR1: 0.0010 - forelegR2: 0.0010 - forelegL1: 0.0010 - forelegL2: 0.0010 - midlegR1: 0.0010 - midlegR2: 0.0010 - midlegL1: 0.0010 - midlegL2: 0.0010 - hindlegR1: 0.0010 - hindlegR2: 0.0010 - hindlegL1: 0.0010 - hindlegL2: 0.0010 - pedipalpR2: 0.0010 - pedipalpL2: 0.0010 - antlegR3: 0.0010 - antlegR4: 9.8585e-04 - antlegL3: 0.0010 - antlegL4: 0.0010 - val_loss: 0.0063 - val_ohkm: 0.0052 - val_prosoma: 9.7833e-04 - val_pedicel: 9.5453e-04 - val_opisthosoma: 0.0010 - val_pedipalpR1: 0.0010 - val_pedipalpL1: 0.0010 - val_antlegR1: 0.0010 - val_antlegR2: 0.0010 - val_antlegL1: 0.0010 - val_antlegL2: 0.0010 - val_forelegR1: 0.0010 - val_forelegR2: 0.0010 - val_forelegL1: 0.0010 - val_forelegL2: 0.0010 - val_midlegR1: 0.0010 - val_midlegR2: 0.0010 - val_midlegL1: 0.0010 - val_midlegL2: 0.0010 - val_hindlegR1: 0.0010 - val_hindlegR2: 0.0010 - val_hindlegL1: 0.0010 - val_hindlegL2: 0.0010 - val_pedipalpR2: 0.0010 - val_pedipalpL2: 0.0010 - val_antlegR3: 0.0010 - val_antlegR4: 9.8150e-04 - val_antlegL3: 0.0010 - val_antlegL4: 0.0010 - lr: 1.0000e-04 - 39s/epoch - 195ms/step
...until I force stopped the process. I appreciate any help you can provide.
Hi @amblypatty,
Originally, we thought this error might be caused by the plotting just the visualizations (confidence maps overlaid on instances) during training; however, after tracking down the error, we found that the real problem was that our pipeline for the top-down model is not set-up to handle input scaling on the second model (the centered instance model). It seems your input_scaling
is set to the default 1.0 so we don't expect to see this particular error in your case.
Unless I overlooked something, the logs seem to indicate that training has completed the 2nd epoch and is about to head into the 3rd epoch? Some clarifying questions: Are the logs truncated? What behavior are you experiencing?
Thanks, Liezl
Hi @roomrys,
Indeed, I terminated the process after seeing the PredictCost()
function failed in the first epoch:
Epoch 1/200
2023-02-20 23:13:56.422111: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 544 } dim { size: 544 } dim { size: 3 } } }
2023-02-20 23:14:02.495095: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 27 } dim { size: 136 } dim { size: 136 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -5 } dim { size: -6 } dim { size: 1 } } }
The previous result after training the top-down model with this error in each epoch (though, the warning shows up earlier) was a predictions.pkg.slp file with 'mean scores' but no instances on the suggested frames when I run:
!sleap-track \
-m models/230218_232711.centroid \
-m models/230218_232711.centered_instance \
--only-suggested-frames \
-o 230218_232711_predicted_suggestions.slp \
Where I get a complete prediction (with PredictCost() errors):
INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
Started inference at: 2023-02-19 05:15:02.414019
│ 'data_path': 'resolved_skeletons_with_predictions.pkg.slp',
│ 'models': [
│ │ 'models/230218_232711.centroid',
│ │ 'models/230218_232711.centered_instance'
│ ],
│ 'frames': '',
│ 'only_labeled_frames': False,
│ 'only_suggested_frames': True,
│ 'output': '230218_232711_predicted_suggestions.slp',
│ 'no_empty_frames': False,
│ 'verbosity': 'rich',
│ 'video.dataset': None,
│ 'video.input_format': 'channels_last',
│ 'video.index': '',
│ 'cpu': False,
│ 'first_gpu': False,
│ 'last_gpu': False,
│ 'gpu': 'auto',
│ 'max_edge_length_ratio': 0.25,
│ 'dist_penalty_weight': 1.0,
│ 'batch_size': 4,
│ 'open_in_gui': False,
│ 'peak_threshold': 0.2,
│ 'tracking.tracker': None,
│ 'tracking.target_instance_count': None,
│ 'tracking.pre_cull_to_target': None,
│ 'tracking.pre_cull_iou_threshold': None,
│ 'tracking.post_connect_single_breaks': None,
│ 'tracking.clean_instance_count': None,
│ 'tracking.clean_iou_threshold': None,
│ 'tracking.similarity': None,
│ 'tracking.match': None,
│ 'tracking.track_window': None,
│ 'tracking.min_new_track_points': None,
│ 'tracking.min_match_points': None,
│ 'tracking.img_scale': None,
│ 'tracking.of_window_size': None,
│ 'tracking.of_max_levels': None,
│ 'tracking.save_shifted_instances': None,
│ 'tracking.kf_node_indices': None,
│ 'tracking.kf_init_frame_count': None
INFO:sleap.nn.inference:Auto-selected GPU 0 with 40533 MiB of free memory.
SLEAP: 1.2.9
TensorFlow: 2.8.4
Numpy: 1.21.6
Python: 3.8.10
OS: Linux-5.10.147+-x86_64-with-glibc2.29
GPUs: 1/1 available
Device: /physical_device:GPU:0
Available: True
Initalized: False
Memory growth: True
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?2023-02-19 05:15:29.605879: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -51 } dim { size: -52 } dim { size: -53 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -54 } dim { size: -55 } dim { size: 1 } } }
2023-02-19 05:15:29.606433: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -59 } dim { size: -60 } dim { size: 3 } } }
2023-02-19 05:15:29.613101: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -109 } dim { size: -110 } dim { size: -111 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -6 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: -112 } dim { size: -113 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━ 94% ETA: 0:00:01 58.1 FPS2023-02-19 05:15:35.958628: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -51 } dim { size: -52 } dim { size: -53 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -54 } dim { size: -55 } dim { size: 1 } } }
2023-02-19 05:15:35.959166: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 3 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -59 } dim { size: -60 } dim { size: 3 } } }
2023-02-19 05:15:35.966002: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -109 } dim { size: -110 } dim { size: -111 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -6 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40233992192 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: -112 } dim { size: -113 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 13.9 FPS
Finished inference at: 2023-02-19 05:15:37.534322
Total runtime: 35.120323181152344 secs
Predicted frames: 51/51
│ 'model_paths': [
│ │ 'models/230218_232711.centroid/training_config.json',
│ │ 'models/230218_232711.centered_instance/training_config.json'
│ ],
│ 'predictor': 'TopDownPredictor',
│ 'sleap_version': '1.2.9',
│ 'platform': 'Linux-5.10.147+-x86_64-with-glibc2.29',
│ 'command': '/usr/local/bin/sleap-track -m models/230218_232711.centroid -m models/230218_232711.centered_instance --only-suggested-frames -o 230218_232711_predicted_suggestions.slp resolved_skeletons_with_predictions.pkg.slp',
│ 'data_path': 'resolved_skeletons_with_predictions.pkg.slp',
│ 'output_path': '230218_232711_predicted_suggestions.slp',
│ 'total_elapsed': 35.120323181152344,
│ 'start_timestamp': '2023-02-19 05:15:02.414019',
│ 'finish_timestamp': '2023-02-19 05:15:37.534322'
Saved output: 230218_232711_predicted_suggestions.slp
...and then merge the predictions in the SLEAP GUI. Additionally, there are no metrics for the centered_instance model:
The image above shows, in the background, a suggested frame (313) that has a mean score but there is no predicted instance on the frame. In the foreground shows the evaluation metrics window where the most recent centered_instance model shows empty cells for the evaluation metrics, but the previous centered_instance model shows the metrics (expected).
Thanks for your help, Patrick
Hello @roomrys and @talmo,
I am still experiencing this issue, even in the newest 1.3.0a0 release. I have tried redoing this with a few different hyperparameters to try and get the previously expected behavior, but I am still experiencing an error in the PredictCost() function. I am afraid I don't really know what it means or how to get around it. I would really appreciate some help on this one.
Here is the latest output from my top-down training, first from the Centroid and then the Centered-Instance: trainer... training labels from: resolved_skeletons_with_predictions.pkg.slp training and validation splits from validation fraction: 0.1 Splits: Training = 271 / Validation = 30. up for training... up pipeline builders... up model... test pipeline... test example. [2.734s] Input shape: (544, 960, 3) Keras model. Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False) Max stride: 16 Parameters: 1,953,393 Heads: [0] = CentroidConfmapsHead(anchor_part='pedicel', sigma=2.5, output_stride=2, loss_weight=1.0) Outputs: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 272, 480, 1), dtype=tf.float32, name=None), name='CentroidConfmapsHead/BiasAdd:0', description="created by layer 'CentroidConfmapsHead'") up data pipelines... set: n = 271 set: n = 30 up optimization... Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=20) up outputs... run path: models/230312_144956.centroid up visualization...
2023-03-12 19:06:42.229412: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } }
2023-03-12 19:06:43.501136: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } }
Unable to use Qt backend for matplotlib. This probably means Qt is running headless.
Unable to use Qt backend for matplotlib. This probably means Qt is running headless. trainer set up. [6.9s] for training data generation... creating training datasets. [14.2s] training loop...
Epoch 1/200
200/200 - 38s - loss: 9.3239e-05 - val_loss: 5.8659e-05 - lr: 1.0000e-04 - 38s/epoch - 190ms/step
Epoch 2/200
200/200 - 22s - loss: 3.1217e-05 - val_loss: 2.7985e-05 - lr: 1.0000e-04 - 22s/epoch - 111ms/step
Epoch 3/200
200/200 - 23s - loss: 1.8750e-05 - val_loss: 1.7997e-05 - lr: 1.0000e-04 - 23s/epoch - 113ms/step
... (truncated here as training ensues the same) ...
Epoch 46: ReduceLROnPlateau reducing learning rate to 3.12499992105586e-06.
200/200 - 21s - loss: 2.8520e-06 - val_loss: 4.5091e-06 - lr: 6.2500e-06 - 21s/epoch - 107ms/step
Epoch 47/200
200/200 - 22s - loss: 3.1557e-06 - val_loss: 1.9466e-06 - lr: 3.1250e-06 - 22s/epoch - 108ms/step
Epoch 47: early stopping training loop. [17.6 min] visualization directory: models/230312_144956.centroid/viz evaluation metrics to model folder...
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?2023-03-12 19:24:35.575939: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -74 } dim { size: -75 } dim { size: -76 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -77 } dim { size: -78 } dim { size: 1 } } }
2023-03-12 19:24:35.576331: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -83 } dim { size: -84 } dim { size: 3 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 99% ETA: 0:00:01 27.9 FPS2023-03-12 19:24:45.964082: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -74 } dim { size: -75 } dim { size: -76 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -77 } dim { size: -78 } dim { size: 1 } } }
2023-03-12 19:24:45.964449: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 3 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -83 } dim { size: -84 } dim { size: 3 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 15.8 FPS
INFO:sleap.nn.evals:Saved predictions: models/230312_144956.centroid/labels_pr.train.slp
INFO:sleap.nn.evals:Saved metrics: models/230312_144956.centroid/metrics.train.npz
INFO:sleap.nn.evals:OKS mAP: 0.980198
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?2023-03-12 19:24:49.825924: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -74 } dim { size: -75 } dim { size: -76 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -77 } dim { size: -78 } dim { size: 1 } } }
2023-03-12 19:24:49.826309: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -83 } dim { size: -84 } dim { size: 3 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━ 93% ETA: 0:00:01 88.9 FPS2023-03-12 19:24:51.899967: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -74 } dim { size: -75 } dim { size: -76 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -77 } dim { size: -78 } dim { size: 1 } } }
2023-03-12 19:24:51.900339: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 2 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -83 } dim { size: -84 } dim { size: 3 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 10.7 FPS
INFO:sleap.nn.evals:Saved predictions: models/230312_144956.centroid/labels_pr.val.slp
INFO:sleap.nn.evals:Saved metrics: models/230312_144956.centroid/metrics.val.npz
INFO:sleap.nn.evals:OKS mAP: 0.930693 GPU 0 with 40510 MiB of free memory. GPU 0 for acceleration. GPU memory pre-allocation.
GPUs: 1/1 available
Device: /physical_device:GPU:0
Available: True
Initalized: False
Memory growth: True trainer... training labels from: resolved_skeletons_with_predictions.pkg.slp training and validation splits from validation fraction: 0.1 Splits: Training = 271 / Validation = 30. up for training... up pipeline builders... up model... test pipeline...
2023-03-12 19:25:06.978157: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } } test example. [3.324s] Input shape: (512, 512, 3) Keras model. Backbone: UNet(stacks=1, filters=24, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=2, up_interpolate=True, block_contraction=False) Max stride: 16 Parameters: 4,313,235 Heads: [0] = CenteredInstanceConfmapsHead(part_names=['prosoma', 'pedicel', 'opisthosoma', 'pedipalpR1', 'pedipalpL1', 'antlegR1', 'antlegR2', 'antlegL1', 'antlegL2', 'forelegR1', 'forelegR2', 'forelegL1', 'forelegL2', 'midlegR1', 'midlegR2', 'midlegL1', 'midlegL2', 'hindlegR1', 'hindlegR2', 'hindlegL1', 'hindlegL2', 'pedipalpR2', 'pedipalpL2', 'antlegR3', 'antlegR4', 'antlegL3', 'antlegL4'], anchor_part='pedicel', sigma=2.5, output_stride=4, loss_weight=1.0) Outputs: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 128, 128, 27), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'") up data pipelines... set: n = 271 set: n = 30 up optimization... OHKM enabled: HardKeypointMiningConfig(online_mining=True, hard_to_easy_ratio=2.0, min_hard_keypoints=3, max_hard_keypoints=None, loss_scale=5.0) Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10) up outputs... run path: models/230312_144956.centered_instance up visualization...
2023-03-12 19:25:08.628362: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
2023-03-12 19:25:09.832672: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
Unable to use Qt backend for matplotlib. This probably means Qt is running headless.
Unable to use Qt backend for matplotlib. This probably means Qt is running headless. trainer set up. [6.1s] for training data generation...
2023-03-12 19:25:21.965479: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
2023-03-12 19:25:25.098796: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } } creating training datasets. [15.6s] training loop...
2023-03-12 19:25:25.834120: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
Epoch 1/200
2023-03-12 19:25:51.413321: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
2023-03-12 19:25:56.975111: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 27 } dim { size: 128 } dim { size: 128 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -5 } dim { size: -6 } dim { size: 1 } } }
200/200 - 32s - loss: 0.0071 - ohkm: 0.0060 - prosoma: 0.0012 - pedicel: 0.0012 - opisthosoma: 0.0012 - pedipalpR1: 0.0012 - pedipalpL1: 0.0012 - antlegR1: 0.0012 - antlegR2: 0.0012 - antlegL1: 0.0012 - antlegL2: 0.0012 - forelegR1: 0.0012 - forelegR2: 0.0012 - forelegL1: 0.0012 - forelegL2: 0.0012 - midlegR1: 0.0012 - midlegR2: 0.0012 - midlegL1: 0.0012 - midlegL2: 0.0012 - hindlegR1: 0.0012 - hindlegR2: 0.0012 - hindlegL1: 0.0012 - hindlegL2: 0.0012 - pedipalpR2: 0.0012 - pedipalpL2: 0.0012 - antlegR3: 0.0012 - antlegR4: 0.0011 - antlegL3: 0.0012 - antlegL4: 0.0011 - val_loss: 0.0071 - val_ohkm: 0.0060 - val_prosoma: 0.0011 - val_pedicel: 0.0012 - val_opisthosoma: 0.0012 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0012 - val_antlegR2: 0.0012 - val_antlegL1: 0.0011 - val_antlegL2: 0.0012 - val_forelegR1: 0.0012 - val_forelegR2: 0.0012 - val_forelegL1: 0.0011 - val_forelegL2: 0.0012 - val_midlegR1: 0.0012 - val_midlegR2: 0.0012 - val_midlegL1: 0.0012 - val_midlegL2: 0.0012 - val_hindlegR1: 0.0012 - val_hindlegR2: 0.0012 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0012 - val_pedipalpR2: 0.0012 - val_pedipalpL2: 0.0012 - val_antlegR3: 0.0012 - val_antlegR4: 0.0011 - val_antlegL3: 0.0012 - val_antlegL4: 0.0011 - lr: 1.0000e-04 - 32s/epoch - 160ms/step
Epoch 2/200
2023-03-12 19:26:14.838946: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 20s - loss: 0.0071 - ohkm: 0.0060 - prosoma: 0.0012 - pedicel: 0.0012 - opisthosoma: 0.0012 - pedipalpR1: 0.0012 - pedipalpL1: 0.0012 - antlegR1: 0.0012 - antlegR2: 0.0012 - antlegL1: 0.0012 - antlegL2: 0.0012 - forelegR1: 0.0012 - forelegR2: 0.0012 - forelegL1: 0.0012 - forelegL2: 0.0012 - midlegR1: 0.0012 - midlegR2: 0.0012 - midlegL1: 0.0012 - midlegL2: 0.0012 - hindlegR1: 0.0012 - hindlegR2: 0.0012 - hindlegL1: 0.0012 - hindlegL2: 0.0012 - pedipalpR2: 0.0012 - pedipalpL2: 0.0012 - antlegR3: 0.0012 - antlegR4: 0.0011 - antlegL3: 0.0012 - antlegL4: 0.0011 - val_loss: 0.0071 - val_ohkm: 0.0060 - val_prosoma: 0.0012 - val_pedicel: 0.0012 - val_opisthosoma: 0.0012 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0012 - val_antlegR2: 0.0012 - val_antlegL1: 0.0012 - val_antlegL2: 0.0012 - val_forelegR1: 0.0012 - val_forelegR2: 0.0012 - val_forelegL1: 0.0012 - val_forelegL2: 0.0012 - val_midlegR1: 0.0012 - val_midlegR2: 0.0012 - val_midlegL1: 0.0012 - val_midlegL2: 0.0012 - val_hindlegR1: 0.0012 - val_hindlegR2: 0.0012 - val_hindlegL1: 0.0012 - val_hindlegL2: 0.0012 - val_pedipalpR2: 0.0012 - val_pedipalpL2: 0.0012 - val_antlegR3: 0.0011 - val_antlegR4: 9.8390e-04 - val_antlegL3: 0.0011 - val_antlegL4: 0.0011 - lr: 1.0000e-04 - 20s/epoch - 98ms/step
Epoch 3/200
2023-03-12 19:26:35.598997: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 22s - loss: 0.0071 - ohkm: 0.0059 - prosoma: 0.0011 - pedicel: 0.0011 - opisthosoma: 0.0012 - pedipalpR1: 0.0011 - pedipalpL1: 0.0011 - antlegR1: 0.0011 - antlegR2: 0.0012 - antlegL1: 0.0011 - antlegL2: 0.0012 - forelegR1: 0.0011 - forelegR2: 0.0012 - forelegL1: 0.0011 - forelegL2: 0.0012 - midlegR1: 0.0011 - midlegR2: 0.0012 - midlegL1: 0.0011 - midlegL2: 0.0012 - hindlegR1: 0.0011 - hindlegR2: 0.0012 - hindlegL1: 0.0011 - hindlegL2: 0.0012 - pedipalpR2: 0.0012 - pedipalpL2: 0.0012 - antlegR3: 0.0012 - antlegR4: 0.0011 - antlegL3: 0.0012 - antlegL4: 0.0011 - val_loss: 0.0070 - val_ohkm: 0.0059 - val_prosoma: 0.0011 - val_pedicel: 0.0011 - val_opisthosoma: 0.0012 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0011 - val_antlegR2: 0.0012 - val_antlegL1: 0.0011 - val_antlegL2: 0.0012 - val_forelegR1: 0.0011 - val_forelegR2: 0.0012 - val_forelegL1: 0.0011 - val_forelegL2: 0.0012 - val_midlegR1: 0.0012 - val_midlegR2: 0.0012 - val_midlegL1: 0.0011 - val_midlegL2: 0.0012 - val_hindlegR1: 0.0012 - val_hindlegR2: 0.0012 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0012 - val_pedipalpR2: 0.0012 - val_pedipalpL2: 0.0012 - val_antlegR3: 0.0011 - val_antlegR4: 9.6768e-04 - val_antlegL3: 0.0011 - val_antlegL4: 0.0011 - lr: 1.0000e-04 - 22s/epoch - 110ms/step
Epoch 4/200
2023-03-12 19:26:56.726952: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 21s - loss: 0.0070 - ohkm: 0.0059 - prosoma: 0.0011 - pedicel: 0.0011 - opisthosoma: 0.0012 - pedipalpR1: 0.0011 - pedipalpL1: 0.0011 - antlegR1: 0.0011 - antlegR2: 0.0012 - antlegL1: 0.0011 - antlegL2: 0.0011 - forelegR1: 0.0011 - forelegR2: 0.0012 - forelegL1: 0.0011 - forelegL2: 0.0012 - midlegR1: 0.0011 - midlegR2: 0.0012 - midlegL1: 0.0011 - midlegL2: 0.0012 - hindlegR1: 0.0011 - hindlegR2: 0.0012 - hindlegL1: 0.0011 - hindlegL2: 0.0012 - pedipalpR2: 0.0011 - pedipalpL2: 0.0011 - antlegR3: 0.0011 - antlegR4: 0.0010 - antlegL3: 0.0011 - antlegL4: 0.0011 - val_loss: 0.0070 - val_ohkm: 0.0058 - val_prosoma: 0.0011 - val_pedicel: 0.0011 - val_opisthosoma: 0.0011 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0011 - val_antlegR2: 0.0011 - val_antlegL1: 0.0011 - val_antlegL2: 0.0011 - val_forelegR1: 0.0011 - val_forelegR2: 0.0011 - val_forelegL1: 0.0011 - val_forelegL2: 0.0012 - val_midlegR1: 0.0011 - val_midlegR2: 0.0011 - val_midlegL1: 0.0011 - val_midlegL2: 0.0012 - val_hindlegR1: 0.0011 - val_hindlegR2: 0.0011 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0011 - val_pedipalpR2: 0.0011 - val_pedipalpL2: 0.0011 - val_antlegR3: 0.0011 - val_antlegR4: 9.7633e-04 - val_antlegL3: 0.0011 - val_antlegL4: 0.0010 - lr: 1.0000e-04 - 21s/epoch - 107ms/step
Epoch 5/200
2023-03-12 19:27:18.484206: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 21s - loss: 0.0069 - ohkm: 0.0058 - prosoma: 0.0011 - pedicel: 0.0010 - opisthosoma: 0.0011 - pedipalpR1: 0.0011 - pedipalpL1: 0.0011 - antlegR1: 0.0011 - antlegR2: 0.0011 - antlegL1: 0.0011 - antlegL2: 0.0011 - forelegR1: 0.0011 - forelegR2: 0.0011 - forelegL1: 0.0011 - forelegL2: 0.0011 - midlegR1: 0.0011 - midlegR2: 0.0011 - midlegL1: 0.0011 - midlegL2: 0.0011 - hindlegR1: 0.0011 - hindlegR2: 0.0011 - hindlegL1: 0.0011 - hindlegL2: 0.0011 - pedipalpR2: 0.0011 - pedipalpL2: 0.0011 - antlegR3: 0.0011 - antlegR4: 0.0010 - antlegL3: 0.0011 - antlegL4: 0.0011 - val_loss: 0.0069 - val_ohkm: 0.0058 - val_prosoma: 0.0011 - val_pedicel: 0.0010 - val_opisthosoma: 0.0011 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0011 - val_antlegR2: 0.0011 - val_antlegL1: 0.0011 - val_antlegL2: 0.0011 - val_forelegR1: 0.0011 - val_forelegR2: 0.0011 - val_forelegL1: 0.0011 - val_forelegL2: 0.0011 - val_midlegR1: 0.0011 - val_midlegR2: 0.0011 - val_midlegL1: 0.0011 - val_midlegL2: 0.0012 - val_hindlegR1: 0.0011 - val_hindlegR2: 0.0011 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0011 - val_pedipalpR2: 0.0011 - val_pedipalpL2: 0.0011 - val_antlegR3: 0.0011 - val_antlegR4: 9.5325e-04 - val_antlegL3: 0.0011 - val_antlegL4: 0.0011 - lr: 1.0000e-04 - 21s/epoch - 107ms/step
Epoch 6/200
2023-03-12 19:27:39.438895: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 21s - loss: 0.0068 - ohkm: 0.0058 - prosoma: 9.7589e-04 - pedicel: 8.8568e-04 - opisthosoma: 0.0011 - pedipalpR1: 0.0011 - pedipalpL1: 0.0011 - antlegR1: 0.0011 - antlegR2: 0.0011 - antlegL1: 0.0011 - antlegL2: 0.0011 - forelegR1: 0.0011 - forelegR2: 0.0011 - forelegL1: 0.0011 - forelegL2: 0.0011 - midlegR1: 0.0011 - midlegR2: 0.0011 - midlegL1: 0.0011 - midlegL2: 0.0011 - hindlegR1: 0.0011 - hindlegR2: 0.0011 - hindlegL1: 0.0011 - hindlegL2: 0.0011 - pedipalpR2: 0.0010 - pedipalpL2: 0.0010 - antlegR3: 0.0011 - antlegR4: 0.0010 - antlegL3: 0.0011 - antlegL4: 0.0011 - val_loss: 0.0068 - val_ohkm: 0.0057 - val_prosoma: 9.5023e-04 - val_pedicel: 8.6653e-04 - val_opisthosoma: 0.0011 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0011 - val_antlegR2: 0.0011 - val_antlegL1: 0.0011 - val_antlegL2: 0.0011 - val_forelegR1: 0.0011 - val_forelegR2: 0.0011 - val_forelegL1: 0.0011 - val_forelegL2: 0.0011 - val_midlegR1: 0.0011 - val_midlegR2: 0.0011 - val_midlegL1: 0.0011 - val_midlegL2: 0.0011 - val_hindlegR1: 0.0011 - val_hindlegR2: 0.0011 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0011 - val_pedipalpR2: 0.0010 - val_pedipalpL2: 0.0010 - val_antlegR3: 0.0011 - val_antlegR4: 9.0389e-04 - val_antlegL3: 0.0011 - val_antlegL4: 0.0010 - lr: 1.0000e-04 - 21s/epoch - 104ms/step
Epoch 7/200
2023-03-12 19:28:00.736129: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 23s - loss: 0.0068 - ohkm: 0.0057 - prosoma: 9.0675e-04 - pedicel: 8.2878e-04 - opisthosoma: 0.0011 - pedipalpR1: 0.0011 - pedipalpL1: 0.0011 - antlegR1: 0.0011 - antlegR2: 0.0011 - antlegL1: 0.0011 - antlegL2: 0.0011 - forelegR1: 0.0011 - forelegR2: 0.0011 - forelegL1: 0.0011 - forelegL2: 0.0011 - midlegR1: 0.0011 - midlegR2: 0.0011 - midlegL1: 0.0011 - midlegL2: 0.0011 - hindlegR1: 0.0011 - hindlegR2: 0.0011 - hindlegL1: 0.0011 - hindlegL2: 0.0011 - pedipalpR2: 9.7089e-04 - pedipalpL2: 9.6632e-04 - antlegR3: 0.0011 - antlegR4: 0.0010 - antlegL3: 0.0011 - antlegL4: 0.0011 - val_loss: 0.0067 - val_ohkm: 0.0057 - val_prosoma: 8.2120e-04 - val_pedicel: 7.7487e-04 - val_opisthosoma: 0.0010 - val_pedipalpR1: 0.0011 - val_pedipalpL1: 0.0011 - val_antlegR1: 0.0011 - val_antlegR2: 0.0011 - val_antlegL1: 0.0011 - val_antlegL2: 0.0011 - val_forelegR1: 0.0011 - val_forelegR2: 0.0011 - val_forelegL1: 0.0011 - val_forelegL2: 0.0011 - val_midlegR1: 0.0011 - val_midlegR2: 0.0011 - val_midlegL1: 0.0011 - val_midlegL2: 0.0011 - val_hindlegR1: 0.0011 - val_hindlegR2: 0.0011 - val_hindlegL1: 0.0011 - val_hindlegL2: 0.0011 - val_pedipalpR2: 8.7347e-04 - val_pedipalpL2: 9.0317e-04 - val_antlegR3: 0.0011 - val_antlegR4: 9.1410e-04 - val_antlegL3: 0.0010 - val_antlegL4: 0.0010 - lr: 1.0000e-04 - 23s/epoch - 113ms/step
... Truncated through the rest of the training epochs. Notice the PredictCost() error warning each time...
Epoch 49/200
2023-03-12 19:42:29.306173: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 512 } dim { size: 512 } dim { size: 3 } } }
200/200 - 20s - loss: 0.0031 - ohkm: 0.0027 - prosoma: 3.3843e-04 - pedicel: 3.0595e-04 - opisthosoma: 2.9900e-04 - pedipalpR1: 3.7595e-04 - pedipalpL1: 3.7146e-04 - antlegR1: 4.9511e-04 - antlegR2: 4.9294e-04 - antlegL1: 4.7181e-04 - antlegL2: 4.0852e-04 - forelegR1: 3.7959e-04 - forelegR2: 4.7230e-04 - forelegL1: 3.6725e-04 - forelegL2: 4.3950e-04 - midlegR1: 3.5101e-04 - midlegR2: 4.2124e-04 - midlegL1: 3.4595e-04 - midlegL2: 4.2020e-04 - hindlegR1: 3.6612e-04 - hindlegR2: 3.3246e-04 - hindlegL1: 3.7615e-04 - hindlegL2: 3.2629e-04 - pedipalpR2: 3.7594e-04 - pedipalpL2: 3.8471e-04 - antlegR3: 5.9053e-04 - antlegR4: 6.0350e-04 - antlegL3: 5.1578e-04 - antlegL4: 5.2723e-04 - val_loss: 0.0040 - val_ohkm: 0.0035 - val_prosoma: 4.2798e-04 - val_pedicel: 3.8819e-04 - val_opisthosoma: 3.6716e-04 - val_pedipalpR1: 4.5729e-04 - val_pedipalpL1: 4.7555e-04 - val_antlegR1: 5.9989e-04 - val_antlegR2: 6.6770e-04 - val_antlegL1: 5.8266e-04 - val_antlegL2: 4.6682e-04 - val_forelegR1: 4.9084e-04 - val_forelegR2: 5.7035e-04 - val_forelegL1: 4.2677e-04 - val_forelegL2: 5.4642e-04 - val_midlegR1: 4.6823e-04 - val_midlegR2: 5.3738e-04 - val_midlegL1: 4.1209e-04 - val_midlegL2: 5.5408e-04 - val_hindlegR1: 4.5810e-04 - val_hindlegR2: 5.1612e-04 - val_hindlegL1: 4.6348e-04 - val_hindlegL2: 4.0936e-04 - val_pedipalpR2: 4.6826e-04 - val_pedipalpL2: 4.9571e-04 - val_antlegR3: 7.8056e-04 - val_antlegR4: 7.7968e-04 - val_antlegL3: 5.9276e-04 - val_antlegL4: 6.4306e-04 - lr: 1.2500e-05 - 20s/epoch - 101ms/step
Epoch 49: early stopping training loop. [17.1 min] visualization directory: models/230312_144956.centered_instance/viz evaluation metrics to model folder...
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?2023-03-12 19:42:35.583321: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -26 } dim { size: -27 } dim { size: 3 } } }
2023-03-12 19:42:35.592771: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -52 } dim { size: -53 } dim { size: -54 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -9 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: -56 } dim { size: -57 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 99% ETA: 0:00:01 26.9 FPS2023-03-12 19:42:45.387926: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 3 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -26 } dim { size: -27 } dim { size: 3 } } }
2023-03-12 19:42:45.397145: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -52 } dim { size: -53 } dim { size: -54 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -9 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: -56 } dim { size: -57 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 19.4 FPS
INFO:sleap.nn.evals:Saved predictions: models/230312_144956.centered_instance/labels_pr.train.slp
INFO:sleap.nn.evals:Saved metrics: models/230312_144956.centered_instance/metrics.train.npz
INFO:sleap.nn.evals:OKS mAP: 0.870044
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?2023-03-12 19:42:48.149569: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -26 } dim { size: -27 } dim { size: 3 } } }
2023-03-12 19:42:48.158952: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -52 } dim { size: -53 } dim { size: -54 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -9 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: -56 } dim { size: -57 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━ 93% ETA: 0:00:01 92.7 FPS2023-03-12 19:42:49.475717: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 2 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -26 } dim { size: -27 } dim { size: 3 } } }
2023-03-12 19:42:49.485143: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -52 } dim { size: -53 } dim { size: -54 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -9 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -9 } dim { size: -56 } dim { size: -57 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 16.0 FPS
INFO:sleap.nn.evals:Saved predictions: models/230312_144956.centered_instance/labels_pr.val.slp
INFO:sleap.nn.evals:Saved metrics: models/230312_144956.centered_instance/metrics.val.npz
INFO:sleap.nn.evals:OKS mAP: 0.830889
You will notice that there still is a metrics evaluation but with PredictCost() errors. I then predict on the suggested frames:
INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
Started inference at: 2023-03-12 20:33:42.799078
│ 'data_path': 'resolved_skeletons_with_predictions.pkg.slp',
│ 'models': [
│ │ 'models/230312_144956.centroid',
│ │ 'models/230312_144956.centered_instance'
│ ],
│ 'frames': '',
│ 'only_labeled_frames': False,
│ 'only_suggested_frames': True,
│ 'output': '230312_144956_predicted_suggestions.slp',
│ 'no_empty_frames': False,
│ 'verbosity': 'rich',
│ 'video.dataset': None,
│ 'video.input_format': 'channels_last',
│ 'video.index': '',
│ 'cpu': False,
│ 'first_gpu': False,
│ 'last_gpu': False,
│ 'gpu': 'auto',
│ 'max_edge_length_ratio': 0.25,
│ 'dist_penalty_weight': 1.0,
│ 'batch_size': 4,
│ 'open_in_gui': False,
│ 'peak_threshold': 0.2,
│ 'tracking.tracker': None,
│ 'tracking.target_instance_count': None,
│ 'tracking.pre_cull_to_target': None,
│ 'tracking.pre_cull_iou_threshold': None,
│ 'tracking.post_connect_single_breaks': None,
│ 'tracking.clean_instance_count': None,
│ 'tracking.clean_iou_threshold': None,
│ 'tracking.similarity': None,
│ 'tracking.match': None,
│ 'tracking.robust': None,
│ 'tracking.track_window': None,
│ 'tracking.min_new_track_points': None,
│ 'tracking.min_match_points': None,
│ 'tracking.img_scale': None,
│ 'tracking.of_window_size': None,
│ 'tracking.of_max_levels': None,
│ 'tracking.save_shifted_instances': None,
│ 'tracking.kf_node_indices': None,
│ 'tracking.kf_init_frame_count': None
INFO:sleap.nn.inference:Auto-selected GPU 0 with 40510 MiB of free memory.
SLEAP: 1.3.0a0
TensorFlow: 2.8.4
Numpy: 1.22.4
Python: 3.9.16
OS: Linux-5.10.147+-x86_64-with-glibc2.31
GPUs: 1/1 available
Device: /physical_device:GPU:0
Available: True
Initalized: False
Memory growth: True
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% ETA: -:--:-- ?2023-03-12 20:33:56.497439: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -56 } dim { size: -57 } dim { size: -58 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -59 } dim { size: -60 } dim { size: 1 } } }
2023-03-12 20:33:56.497985: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -67 } dim { size: -68 } dim { size: 3 } } }
2023-03-12 20:33:56.503357: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -111 } dim { size: -112 } dim { size: -113 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -6 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: -114 } dim { size: -115 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━ 94% ETA: 0:00:01 76.1 FPS2023-03-12 20:34:01.203800: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -56 } dim { size: -57 } dim { size: -58 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -59 } dim { size: -60 } dim { size: 1 } } }
2023-03-12 20:34:01.204359: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 3 } dim { size: 1080 } dim { size: 1920 } dim { size: 3 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2200 num_cores: 12 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 40370176 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -67 } dim { size: -68 } dim { size: 3 } } }
2023-03-12 20:34:01.209752: W tensorflow/core/grappler/costs/] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -111 } dim { size: -112 } dim { size: -113 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -6 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA A100-SXM4-40GB" frequency: 1410 num_cores: 108 environment { key: "architecture" value: "8.0" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 41943040 shared_memory_size_per_multiprocessor: 167936 memory_size: 40202993664 bandwidth: 1555200000 } outputs { dtype: DT_FLOAT shape { dim { size: -6 } dim { size: -114 } dim { size: -115 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 18.7 FPS
Finished inference at: 2023-03-12 20:34:02.183675
Total runtime: 19.384610176086426 secs
Predicted frames: 51/51
│ 'model_paths': [
│ │ 'models/230312_144956.centroid/training_config.json',
│ │ 'models/230312_144956.centered_instance/training_config.json'
│ ],
│ 'predictor': 'TopDownPredictor',
│ 'sleap_version': '1.3.0a0',
│ 'platform': 'Linux-5.10.147+-x86_64-with-glibc2.31',
│ 'command': '/usr/local/bin/sleap-track -m models/230312_144956.centroid -m models/230312_144956.centered_instance --only-suggested-frames -o 230312_144956_predicted_suggestions.slp resolved_skeletons_with_predictions.pkg.slp',
│ 'data_path': 'resolved_skeletons_with_predictions.pkg.slp',
│ 'output_path': '230312_144956_predicted_suggestions.slp',
│ 'total_elapsed': 19.384610176086426,
│ 'start_timestamp': '2023-03-12 20:33:42.799078',
│ 'finish_timestamp': '2023-03-12 20:34:02.183675'
Saved output: 230312_144956_predicted_suggestions.slp
But the problem is that the prediction file is empty. Even though it has the same file size (57kb) as previous prediction files that have worked. When I merge the prediction file into my current SLEAP project, nothing happens. When I open the prediction file by itself, nothing shows up either, but it could be because there isn't a video file attached to it.
Additionally, as in my previous comment, I am still unable to see a model metric evaluation. Please let me know if there is something else I can provide to help solve this issue. I am stuck until this is solved.
Hi @amblypatty,
Could you share everything needed to do the training/inference (video, slp, models) and the 230312_144956_predicted_suggestions.slp to [email protected]? Sorry, Github doesn't notify for reactions, but thanks for bumping this again - it had gotten buried.... Let's get you unstuck.
Thanks, Liezl
One of our labmates is also seeming to experience this issue. I can send to you if you want an example Liezl, but they're currently running an older SLEAP version.
i think i am experiencing a similar issue. i am very new to this but reading through this it seems very similar to what happens for me. i have tried optimising the training parameters for my top-down multianimal model, and when i tweak the input scaling (and the max stride) settings, in some cases i receive an error message in the GUI saying that the training failed. for my centroid model, keeping the input scaling at 0.5 and the max stride at 32 works. but increasing the input scaling to 1.0 and the max stride to 64 i start seeing this issue. i will keep an eye on this issue. i just thought i would mention that i am experiencing this. thank you also for an amazing tool. i really like SLEAP.
Hello, I am getting this issue as well, but at input scaling of 0.5. I need to use 0.5 to get the model to run on my 8GB GPU with 1280x1024 video, by changing that and by reducing filters from 64 to 48, and rate from 2 to 1.5, I was finally able to get the model to run. Attached error code. Is there anything I can do? Thanks for the support
} job:{ "data": { "labels": { "training_labels": null, "validation_labels": null, "validation_fraction": 0.1, "test_labels": null, "split_by_inds": false, "training_inds": null, "validation_inds": null, "test_inds": null, "search_path_hints": [], "skeletons": [] }, "preprocessing": { "ensure_rgb": false, "ensure_grayscale": false, "imagenet_mode": null, "input_scaling": 0.5, "pad_to_stride": null, "resize_and_pad_to_target": true, "target_height": null, "target_width": null }, "instance_cropping": { "center_on_part": "back", "crop_size": 592, "crop_size_detection_padding": 16 } }, "model": { "backbone": { "leap": null, "unet": { "stem_stride": null, "max_stride": 16, "output_stride": 2, "filters": 48, "filters_rate": 1.5, "middle_block": true, "up_interpolate": false, "stacks": 1 }, "hourglass": null, "resnet": null, "pretrained_encoder": null }, "heads": { "single_instance": null, "centroid": null, "centered_instance": null, "multi_instance": null, "multi_class_bottomup": null, "multi_class_topdown": { "confmaps": { "anchor_part": "back", "part_names": null, "sigma": 5.0, "output_stride": 2, "loss_weight": 1.0, "offset_refinement": false }, "class_vectors": { "classes": [ "o", "d" ], "num_fc_layers": 3, "num_fc_units": 64, "global_pool": true, "output_stride": 16, "loss_weight": 1.0 } } }, "base_checkpoint": null }, "optimization": { "preload_data": true, "augmentation_config": { "rotate": false, "rotation_min_angle": -180.0, "rotation_max_angle": 180.0, "translate": false, "translate_min": -5, "translate_max": 5, "scale": false, "scale_min": 0.9, "scale_max": 1.1, "uniform_noise": false, "uniform_noise_min_val": 0.0, "uniform_noise_max_val": 10.0, "gaussian_noise": false, "gaussian_noise_mean": 5.0, "gaussian_noise_stddev": 1.0, "contrast": false, "contrast_min_gamma": 0.5, "contrast_max_gamma": 2.0, "brightness": false, "brightness_min_val": 0.0, "brightness_max_val": 10.0, "random_crop": false, "random_crop_height": 256, "random_crop_width": 256, "random_flip": true, "flip_horizontal": false }, "online_shuffling": true, "shuffle_buffer_size": 128, "prefetch": true, "batch_size": 8, "batches_per_epoch": null, "min_batches_per_epoch": 200, "val_batches_per_epoch": null, "min_val_batches_per_epoch": 10, "epochs": 100, "optimizer": "adam", "initial_learning_rate": 0.0001, "learning_rate_schedule": { "reduce_on_plateau": true, "reduction_factor": 0.5, "plateau_min_delta": 1e-06, "plateau_patience": 5, "plateau_cooldown": 3, "min_learning_rate": 1e-08 }, "hard_keypoint_mining": { "online_mining": false, "hard_to_easy_ratio": 2.0, "min_hard_keypoints": 2, "max_hard_keypoints": null, "loss_scale": 5.0 }, "early_stopping": { "stop_training_on_plateau": true, "plateau_min_delta": 1e-06, "plateau_patience": 10 } }, "outputs": { "save_outputs": true, "run_name": "231103_162437.multi_class_topdown.n=20", "run_name_prefix": "", "run_name_suffix": "", "runs_folder": "C:/ml/sleap/labels\models", "tags": [ "" ], "save_visualizations": true, "delete_viz_images": true, "zip_outputs": false, "log_to_csv": true, "checkpointing": { "initial_model": false, "best_model": true, "every_epoch": false, "latest_model": false, "final_model": false }, "tensorboard": { "write_logs": false, "loss_frequency": "epoch", "architecture_graph": false, "profile_graph": false, "visualizations": true }, "zmq": { "subscribe_to_controller": true, "controller_address": "tcp://", "controller_polling_timeout": 10, "publish_updates": true, "publish_address": "tcp://" } }, "name": "", "description": "", "sleap_version": "1.3.3", "filename": "C:\Users\smasr\AppData\Local\Temp\tmpb8i55rmq\231103_162437_training_job.json" } GPU 0 with 7963 MiB of free memory. GPU 0 for acceleration. GPU memory pre-allocation. GPUs: 1/1 available Device: /physical_device:GPU:0 Available: True Initalized: False Memory growth: True trainer... training labels from: C:/ml/sleap/labels/labels.v001.slp training and validation splits from validation fraction: 0.1 Splits: Training = 18 / Validation = 2. up for training... up pipeline builders... up model... test pipeline... 2023-11-03 16:24:41.318359: I tensorflow/core/platform/] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-11-03 16:24:41.699298: I tensorflow/core/common_runtime/gpu/] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5417 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4070 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9 test example. [2.027s] Input shape: (592, 592, 3) Keras model. Backbone: UNet(stacks=1, filters=48, filters_rate=1.5, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=False, block_contraction=False) Max stride: 16 Parameters: 3,326,096 Heads: [0] = CenteredInstanceConfmapsHead(part_names=['nose', 'neck', 'back', 'tailstart', 'tailend'], anchor_part='back', sigma=5.0, output_stride=2, loss_weight=1.0) [1] = ClassVectorsHead(classes=['o', 'd'], num_fc_layers=3, num_fc_units=64, global_pool=True, output_stride=16, loss_weight=1.0) Outputs: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 296, 296, 5), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'") [1] = KerasTensor(type_spec=TensorSpec(shape=(None, 2), dtype=tf.float32, name=None), name='ClassVectorsHead/Softmax:0', description="created by layer 'ClassVectorsHead'") from scratch up data pipelines... set: n = 18 set: n = 2 up optimization... Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-06, plateau_patience=10) up outputs... INFO:sleap.nn.callbacks:Training controller subscribed to: tcp:// (topic: ) ZMQ controller subcribed to: tcp:// INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp:// for: not_set ZMQ progress reporter publish on: tcp:// run path: C:/ml/sleap/labels\models\231103_162437.multi_class_topdown.n=20 up visualization... trainer set up. [3.3s] for training data generation... creating training datasets. [3.2s] training loop... Epoch 1/100 2023-11-03 16:24:50.027369: I tensorflow/stream_executor/cuda/] Loaded cuDNN version 8201 2023-11-03 16:24:51.105728: W tensorflow/stream_executor/gpu/] INTERNAL: ptxas exited with non-zero error code -1, output: Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. This message will be only logged once. 2023-11-03 16:24:54.369871: W tensorflow/core/common_runtime/] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.43GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:54.370160: W tensorflow/core/common_runtime/] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.43GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:56.097067: I tensorflow/stream_executor/cuda/] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once. 2023-11-03 16:24:56.987300: W tensorflow/core/common_runtime/] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.40GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:56.987450: W tensorflow/core/common_runtime/] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.40GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:57.053966: W tensorflow/core/common_runtime/] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:57.054198: W tensorflow/core/common_runtime/] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:57.415488: W tensorflow/core/common_runtime/] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.71GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:57.416283: W tensorflow/core/common_runtime/] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.71GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:57.829054: W tensorflow/core/common_runtime/] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-03 16:24:57.829268: W tensorflow/core/common_runtime/] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. WARNING:tensorflow:Callback method on_train_batch_end is slow compared to the batch time (batch time: 0.1977s vs on_train_batch_end time: 0.2514s). Check your callbacks. Traceback (most recent call last): File "C:\Users\smasr.conda\envs\das\envs\sleap2\Scripts\", line 33, in sys.exit(load_entry_point('sleap==1.3.3', 'console_scripts', 'sleap-train')()) File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\", line 2014, in main trainer.train() File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\", line 941, in train verbose=2, File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\keras\utils\", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\", line 280, in on_epoch_end figure = self.plot_fn() File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\", line 1786, in viz_fn=lambda: visualize_example(next(training_viz_ds_iter)), File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\", line 1766, in visualize_example preds = find_peaks(tf.expand_dims(example["instance_image"], axis=0)) File "C:\Users\smasr.conda\envs\das\envs\sleap2\lib\site-packages\sleap\nn\", line 2088, in call out = self.keras_model(crops) ValueError: Exception encountered when calling layer "find_instance_peaks" (type FindInstancePeaks).
Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 592, 592, 3), found shape=(1, 296, 296, 3)
Hi @smasri09,
I know you said that you needed an input scaling of 0.5 to get the model to run on you 8GB GPU, but is there any way you can keep top-down-id model to an input scaling of 1 and just adjust the centroid model input scaling? Maybe even lowering it less than 0.5? Similar to the centered instance model, the top-down-id model does not support adjusting the input scaling - it relies on the centroid model taking crops of the full image to save on memory, but then keeps full resolution in the crop to accurately locate smaller body parts.
Thanks, Liezl
HI, I ran into the same issue training a leap-backbone top-down centered instance model with input scaling at 0.25 (it was just there by default, or maybe from an earlier run). Google brought me here and switching input scaling 1 fixed the problem. Here's the log in case that is helpful:
Output Log:
Using already trained model for centroid: /home/ammon/Documents/Scripts/FishTrack/sleap/models/240222_132820.centroid.n=20/training_config.json
Resetting monitor window.
Polling: /home/ammon/Documents/Scripts/FishTrack/sleap/models/240222_145208.centered_instance.n=20/viz/validation.*.png
Start training centered_instance...
['sleap-train', '/tmp/tmpvfx421d7/240222_145208_training_job.json', '/home/ammon/Documents/Scripts/FishTrack/sleap/jallefish.labels.v001.slp', '--zmq', '--save_viz']
SLEAP: 1.3.3
TensorFlow: 2.7.0
Numpy: 1.19.5
Python: 3.7.12
OS: Linux-5.15.0-94-generic-x86_64-with-debian-bullseye-sid labels file: /home/ammon/Documents/Scripts/FishTrack/sleap/jallefish.labels.v001.slp profile: /tmp/tmpvfx421d7/240222_145208_training_job.json{
"training_job_path": "/tmp/tmpvfx421d7/240222_145208_training_job.json",
"labels_path": "/home/ammon/Documents/Scripts/FishTrack/sleap/jallefish.labels.v001.slp",
"video_paths": [
"val_labels": null,
"test_labels": null,
"base_checkpoint": null,
"tensorboard": false,
"save_viz": true,
"zmq": true,
"run_name": "",
"prefix": "",
"suffix": "",
"cpu": false,
"first_gpu": false,
"last_gpu": false,
"gpu": "auto"
} job:{
"data": {
"labels": {
"training_labels": null,
"validation_labels": null,
"validation_fraction": 0.1,
"test_labels": null,
"split_by_inds": false,
"training_inds": null,
"validation_inds": null,
"test_inds": null,
"search_path_hints": [],
"skeletons": []
"preprocessing": {
"ensure_rgb": false,
"ensure_grayscale": true,
"imagenet_mode": null,
"input_scaling": 0.25,
"pad_to_stride": null,
"resize_and_pad_to_target": true,
"target_height": null,
"target_width": null
"instance_cropping": {
"center_on_part": "Body-line",
"crop_size": null,
"crop_size_detection_padding": 16
"model": {
"backbone": {
"leap": {
"max_stride": 8,
"output_stride": 4,
"filters": 64,
"filters_rate": 2.0,
"up_interpolate": false,
"stacks": 1
"unet": null,
"hourglass": null,
"resnet": null,
"pretrained_encoder": null
"heads": {
"single_instance": null,
"centroid": null,
"centered_instance": {
"anchor_part": "Body-line",
"part_names": null,
"sigma": 2.5,
"output_stride": 4,
"loss_weight": 1.0,
"offset_refinement": false
"multi_instance": null,
"multi_class_bottomup": null,
"multi_class_topdown": null
"base_checkpoint": null
"optimization": {
"preload_data": true,
"augmentation_config": {
"rotate": true,
"rotation_min_angle": -15.0,
"rotation_max_angle": 15.0,
"translate": false,
"translate_min": -5,
"translate_max": 5,
"scale": false,
"scale_min": 0.9,
"scale_max": 1.1,
"uniform_noise": false,
"uniform_noise_min_val": 0.0,
"uniform_noise_max_val": 10.0,
"gaussian_noise": true,
"gaussian_noise_mean": 5.0,
"gaussian_noise_stddev": 1.0,
"contrast": false,
"contrast_min_gamma": 0.5,
"contrast_max_gamma": 2.0,
"brightness": true,
"brightness_min_val": 0.0,
"brightness_max_val": 10.0,
"random_crop": false,
"random_crop_height": 256,
"random_crop_width": 256,
"random_flip": true,
"flip_horizontal": false
"online_shuffling": true,
"shuffle_buffer_size": 128,
"prefetch": true,
"batch_size": 8,
"batches_per_epoch": null,
"min_batches_per_epoch": 200,
"val_batches_per_epoch": null,
"min_val_batches_per_epoch": 10,
"epochs": 200,
"optimizer": "adam",
"initial_learning_rate": 0.0001,
"learning_rate_schedule": {
"reduce_on_plateau": true,
"reduction_factor": 0.5,
"plateau_min_delta": 1e-06,
"plateau_patience": 5,
"plateau_cooldown": 3,
"min_learning_rate": 1e-08
"hard_keypoint_mining": {
"online_mining": false,
"hard_to_easy_ratio": 2.0,
"min_hard_keypoints": 2,
"max_hard_keypoints": null,
"loss_scale": 5.0
"early_stopping": {
"stop_training_on_plateau": true,
"plateau_min_delta": 1e-08,
"plateau_patience": 10
"outputs": {
"save_outputs": true,
"run_name": "240222_145208.centered_instance.n=20",
"run_name_prefix": "",
"run_name_suffix": "",
"runs_folder": "/home/ammon/Documents/Scripts/FishTrack/sleap/models",
"tags": [
"save_visualizations": true,
"delete_viz_images": true,
"zip_outputs": false,
"log_to_csv": true,
"checkpointing": {
"initial_model": false,
"best_model": true,
"every_epoch": false,
"latest_model": false,
"final_model": false
"tensorboard": {
"write_logs": false,
"loss_frequency": "epoch",
"architecture_graph": false,
"profile_graph": false,
"visualizations": true
"zmq": {
"subscribe_to_controller": true,
"controller_address": "tcp://",
"controller_polling_timeout": 10,
"publish_updates": true,
"publish_address": "tcp://"
"name": "",
"description": "",
"sleap_version": "1.3.3",
"filename": "/tmp/tmpvfx421d7/240222_145208_training_job.json"
2024-02-22 14:52:10.597671: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-02-22 14:52:10.603512: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-02-22 14:52:10.603672: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero GPU 0 with 11038 MiB of free memory. GPU 0 for acceleration. GPU memory pre-allocation.
GPUs: 1/1 available
Device: /physical_device:GPU:0
Available: True
Initalized: False
Memory growth: True trainer... training labels from: /home/ammon/Documents/Scripts/FishTrack/sleap/jallefish.labels.v001.slp training and validation splits from validation fraction: 0.1 Splits: Training = 18 / Validation = 2. up for training... up pipeline builders... up model... test pipeline...
2024-02-22 14:52:11.407698: I tensorflow/core/platform/] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-22 14:52:11.408510: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-02-22 14:52:11.408692: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-02-22 14:52:11.408806: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-02-22 14:52:11.733932: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-02-22 14:52:11.734090: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-02-22 14:52:11.734232: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-02-22 14:52:11.734326: I tensorflow/core/common_runtime/gpu/] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9233 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6 test example. [1.962s] Input shape: (32, 32, 1) Keras model. Backbone: LeapCNN(stacks=1, filters=64, filters_rate=2.0, down_blocks=3, down_convs_per_block=3, up_blocks=1, up_interpolate=False, up_convs_per_block=2) Max stride: 8 Parameters: 2,509,443 Heads: [0] = CenteredInstanceConfmapsHead(part_names=['Mouth', 'Body-line', 'Tail-tip'], anchor_part='Body-line', sigma=2.5, output_stride=4, loss_weight=1.0) Outputs: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 8, 8, 3), dtype=tf.float32, name=None), name='CenteredInstanceConfmapsHead/BiasAdd:0', description="created by layer 'CenteredInstanceConfmapsHead'") from scratch up data pipelines... set: n = 18 set: n = 2 up optimization... Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08) Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=10) up outputs...
INFO:sleap.nn.callbacks:Training controller subscribed to: tcp:// (topic: ) ZMQ controller subcribed to: tcp://
INFO:sleap.nn.callbacks:Progress reporter publishing on: tcp:// for: not_set ZMQ progress reporter publish on: tcp:// run path: /home/ammon/Documents/Scripts/FishTrack/sleap/models/240222_145208.centered_instance.n=20 up visualization... trainer set up. [3.4s] for training data generation... creating training datasets. [3.0s] training loop...
Epoch 1/200
2024-02-22 14:52:19.009446: I tensorflow/stream_executor/cuda/] Loaded cuDNN version 8201
Traceback (most recent call last):
File "/home/ammon/anaconda3/envs/sleap/bin/sleap-train", line 33, in
Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 32, 32, 1), found shape=(1, 8, 8, 1)
Call arguments received: • inputs=tf.Tensor(shape=(1, 32, 32, 1), dtype=float32) terminate called without an active exception
HI, I ran into the same issue training a leap-backbone top-down centered instance model with input scaling at 0.25 (it was just there by default, or maybe from an earlier run). Google brought me here and switching input scaling 1 fixed the problem. Here's the log in case that is helpful:
Output Log:
Hi! Just want to add I'm running into this issue as well, with updated SLEAP from conda. Assume it is being worked on, but in the mean time was curious what other params (other than batch size) to tweak to make centerd inst training smaller for our GPU limits.