sleap icon indicating copy to clipboard operation
sleap copied to clipboard

Training Error When Mix Color and Grayscale Videos in Project

Open isabelperezf opened this issue 2 years ago • 8 comments

Training fails when the project has a mix of color and grayscale videos (even when we set "Convert Image To: grayscale"). Raises the questions:

  • Does "Convert Image To: grayscale" work as expected?
  • What do we need to do to allow a mix of color and grayscale videos?

Bug description

@auesro and I used Sleap locally to extract and mark suggested frames. We chose grayscale to see the frames. Then, we exported the dataset from Predict-Run training...-Export training job package and we tried to run the training on Colab. Nevertheless, we obtained and unexpected error (see below) after running !sleap-train baseline.centroid.json proyectoprueba.pkg.slp. in the Training Colab Notebook

Can someone help us?

Thanks ;)

Expected behaviour

Training all the networks

Actual behaviour

Colab error.

Your personal set up

Environment packages
# packages in environment at C:\Users\isabel.perezf\Miniconda3\envs\sleap:
#
# Name                    Version                   Build  Channel
absl-py                   0.15.0                   pypi_0    pypi
astunparse                1.6.3                    pypi_0    pypi
attrs                     21.2.0                   pypi_0    pypi
backports-zoneinfo        0.2.1                    pypi_0    pypi
ca-certificates           2021.10.8            h5b45459_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cachetools                4.2.4                    pypi_0    pypi
cattrs                    1.1.1                    pypi_0    pypi
certifi                   2021.10.8                pypi_0    pypi
charset-normalizer        2.0.12                   pypi_0    pypi
clang                     5.0                      pypi_0    pypi
colorama                  0.4.4                    pypi_0    pypi
commonmark                0.9.1                    pypi_0    pypi
cuda-nvcc                 11.3.58              hb8d16a4_0    nvidia
cudatoolkit               11.3.1              h280eb24_10    conda-forge
cudnn                     8.2.1.32             h754d62a_0    conda-forge
cycler                    0.11.0                   pypi_0    pypi
efficientnet              1.0.0                    pypi_0    pypi
ffmpeg                    4.3.1                ha925a31_0    conda-forge
flatbuffers               1.12                     pypi_0    pypi
fonttools                 4.31.2                   pypi_0    pypi
freetype                  2.10.4               h546665d_1    conda-forge
gast                      0.4.0                    pypi_0    pypi
geos                      3.9.1                h39d44d4_2    conda-forge
google-auth               1.35.0                   pypi_0    pypi
google-auth-oauthlib      0.4.6                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
grpcio                    1.44.0                   pypi_0    pypi
h5py                      3.1.0           nompi_py37h19fda09_100    conda-forge
hdf5                      1.10.6          nompi_he0bbb20_101    conda-forge
idna                      3.3                      pypi_0    pypi
image-classifiers         1.0.0                    pypi_0    pypi
imageio                   2.15.0                   pypi_0    pypi
imgaug                    0.4.0                    pypi_0    pypi
imgstore                  0.2.9                    pypi_0    pypi
importlib-metadata        4.11.1                   pypi_0    pypi
intel-openmp              2022.0.0          h57928b3_3663    conda-forge
jbig                      2.1               h8d14728_2003    conda-forge
joblib                    1.1.0                    pypi_0    pypi
jpeg                      9e                   h8ffe710_1    conda-forge
jsmin                     3.0.1                    pypi_0    pypi
jsonpickle                1.2                      pypi_0    pypi
keras                     2.6.0                    pypi_0    pypi
keras-applications        1.0.8                    pypi_0    pypi
keras-preprocessing       1.1.2                    pypi_0    pypi
kiwisolver                1.4.2                    pypi_0    pypi
lcms2                     2.12                 h2a16943_0    conda-forge
lerc                      3.0                  h0e60522_0    conda-forge
libblas                   3.9.0              14_win64_mkl    conda-forge
libcblas                  3.9.0              14_win64_mkl    conda-forge
libdeflate                1.10                 h8ffe710_0    conda-forge
liblapack                 3.9.0              14_win64_mkl    conda-forge
libpng                    1.6.37               h1d00b33_2    conda-forge
libtiff                   4.3.0                hc4061b1_3    conda-forge
libzlib                   1.2.11            h8ffe710_1014    conda-forge
lz4-c                     1.9.3                h8ffe710_1    conda-forge
m2w64-gcc-libgfortran     5.3.0                         6    conda-forge
m2w64-gcc-libs            5.3.0                         7    conda-forge
m2w64-gcc-libs-core       5.3.0                         7    conda-forge
m2w64-gmp                 6.1.0                         2    conda-forge
m2w64-libwinpthread-git   5.0.0.4634.697f757               2    conda-forge
markdown                  3.3.6                    pypi_0    pypi
matplotlib                3.5.1                    pypi_0    pypi
mkl                       2022.0.0           h0e2418a_796    conda-forge
msys2-conda-epoch         20160418                      1    conda-forge
networkx                  2.6.3                    pypi_0    pypi
numpy                     1.19.5           py37h4c2b6ed_3    conda-forge
oauthlib                  3.2.0                    pypi_0    pypi
olefile                   0.46               pyh9f0ad1d_1    conda-forge
opencv-python             4.5.5.62                 pypi_0    pypi
opencv-python-headless    4.5.5.62                 pypi_0    pypi
openjpeg                  2.4.0                hb211442_1    conda-forge
openssl                   3.0.2                h8ffe710_1    conda-forge
opt-einsum                3.3.0                    pypi_0    pypi
packaging                 21.3                     pypi_0    pypi
pandas                    1.3.5            py37h9386db6_0    conda-forge
pillow                    8.4.0            py37hd7d9ad0_0    conda-forge
pip                       22.0.4             pyhd8ed1ab_0    conda-forge
protobuf                  3.19.4                   pypi_0    pypi
psutil                    5.9.0                    pypi_0    pypi
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pygments                  2.11.2                   pypi_0    pypi
pykalman                  0.9.5                    pypi_0    pypi
pyparsing                 3.0.7                    pypi_0    pypi
pyreadline                2.1             py37h03978a9_1005    conda-forge
pyside2                   5.14.1                   pypi_0    pypi
python                    3.7.12          h900ac77_100_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-rapidjson          1.6                      pypi_0    pypi
python_abi                3.7                     2_cp37m    conda-forge
pytz                      2022.1             pyhd8ed1ab_0    conda-forge
pytz-deprecation-shim     0.1.0.post0              pypi_0    pypi
pywavelets                1.3.0                    pypi_0    pypi
pyzmq                     22.3.0                   pypi_0    pypi
qimage2ndarray            1.8.3                    pypi_0    pypi
requests                  2.27.1                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
rich                      10.16.1                  pypi_0    pypi
scikit-image              0.19.2                   pypi_0    pypi
scikit-learn              1.0.2                    pypi_0    pypi
scikit-video              1.1.11                   pypi_0    pypi
scipy                     1.7.3            py37hb6553fb_0    conda-forge
seaborn                   0.11.2                   pypi_0    pypi
segmentation-models       1.0.1                    pypi_0    pypi
setuptools                59.8.0           py37h03978a9_1    conda-forge
setuptools-scm            6.3.2                    pypi_0    pypi
shapely                   1.7.1            py37hc520ffa_5    conda-forge
shiboken2                 5.14.1                   pypi_0    pypi
six                       1.15.0             pyh9f0ad1d_0    conda-forge
sleap                     1.2.2                    pypi_0    pypi
sqlite                    3.38.2               h8ffe710_0    conda-forge
tbb                       2021.5.0             h2d74725_1    conda-forge
tensorboard               2.6.0                    pypi_0    pypi
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
tensorflow                2.6.3                    pypi_0    pypi
tensorflow-estimator      2.6.0                    pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
threadpoolctl             3.1.0                    pypi_0    pypi
tifffile                  2021.11.2                pypi_0    pypi
tk                        8.6.12               h8ffe710_0    conda-forge
tomli                     2.0.1                    pypi_0    pypi
typing-extensions         3.10.0.2                 pypi_0    pypi
tzdata                    2022.1                   pypi_0    pypi
tzlocal                   4.2                      pypi_0    pypi
ucrt                      10.0.20348.0         h57928b3_0    conda-forge
urllib3                   1.26.8                   pypi_0    pypi
vc                        14.2                 hb210afc_6    conda-forge
vs2015_runtime            14.29.30037          h902a5da_6    conda-forge
werkzeug                  2.0.3                    pypi_0    pypi
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
wrapt                     1.12.1                   pypi_0    pypi
xz                        5.2.5                h62dcd97_1    conda-forge
zipp                      3.7.0                    pypi_0    pypi
zlib                      1.2.11            h8ffe710_1014    conda-forge
zstd                      1.5.2                h6255e5f_0    conda-forge
Logs
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
INFO:sleap.nn.training:Versions:
SLEAP: 1.2.3
TensorFlow: 2.8.0
Numpy: 1.21.5
Python: 3.7.13
OS: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
INFO:sleap.nn.training:Training labels file: proyectoprueba.pkg.slp
INFO:sleap.nn.training:Training profile: /usr/local/lib/python3.7/dist-packages/sleap/training_profiles/baseline.centroid.json
INFO:sleap.nn.training:
INFO:sleap.nn.training:Arguments:
INFO:sleap.nn.training:{
    "training_job_path": "baseline.centroid.json",
    "labels_path": "proyectoprueba.pkg.slp",
    "video_paths": [
        ""
    ],
    "val_labels": null,
    "test_labels": null,
    "tensorboard": false,
    "save_viz": false,
    "zmq": false,
    "run_name": "",
    "prefix": "",
    "suffix": "",
    "cpu": false,
    "first_gpu": false,
    "last_gpu": false,
    "gpu": 0
}
INFO:sleap.nn.training:
INFO:sleap.nn.training:Training job:
INFO:sleap.nn.training:{
    "data": {
        "labels": {
            "training_labels": null,
            "validation_labels": null,
            "validation_fraction": 0.1,
            "test_labels": null,
            "split_by_inds": false,
            "training_inds": null,
            "validation_inds": null,
            "test_inds": null,
            "search_path_hints": [],
            "skeletons": []
        },
        "preprocessing": {
            "ensure_rgb": false,
            "ensure_grayscale": false,
            "imagenet_mode": null,
            "input_scaling": 0.5,
            "pad_to_stride": null,
            "resize_and_pad_to_target": true,
            "target_height": null,
            "target_width": null
        },
        "instance_cropping": {
            "center_on_part": null,
            "crop_size": null,
            "crop_size_detection_padding": 16
        }
    },
    "model": {
        "backbone": {
            "leap": null,
            "unet": {
                "stem_stride": null,
                "max_stride": 16,
                "output_stride": 2,
                "filters": 16,
                "filters_rate": 2.0,
                "middle_block": true,
                "up_interpolate": true,
                "stacks": 1
            },
            "hourglass": null,
            "resnet": null,
            "pretrained_encoder": null
        },
        "heads": {
            "single_instance": null,
            "centroid": {
                "anchor_part": null,
                "sigma": 2.5,
                "output_stride": 2,
                "loss_weight": 1.0,
                "offset_refinement": false
            },
            "centered_instance": null,
            "multi_instance": null,
            "multi_class_bottomup": null,
            "multi_class_topdown": null
        }
    },
    "optimization": {
        "preload_data": true,
        "augmentation_config": {
            "rotate": true,
            "rotation_min_angle": -15.0,
            "rotation_max_angle": 15.0,
            "translate": false,
            "translate_min": -5,
            "translate_max": 5,
            "scale": false,
            "scale_min": 0.9,
            "scale_max": 1.1,
            "uniform_noise": false,
            "uniform_noise_min_val": 0.0,
            "uniform_noise_max_val": 10.0,
            "gaussian_noise": false,
            "gaussian_noise_mean": 5.0,
            "gaussian_noise_stddev": 1.0,
            "contrast": false,
            "contrast_min_gamma": 0.5,
            "contrast_max_gamma": 2.0,
            "brightness": false,
            "brightness_min_val": 0.0,
            "brightness_max_val": 10.0,
            "random_crop": false,
            "random_crop_height": 256,
            "random_crop_width": 256,
            "random_flip": false,
            "flip_horizontal": true
        },
        "online_shuffling": true,
        "shuffle_buffer_size": 128,
        "prefetch": true,
        "batch_size": 4,
        "batches_per_epoch": null,
        "min_batches_per_epoch": 200,
        "val_batches_per_epoch": null,
        "min_val_batches_per_epoch": 10,
        "epochs": 200,
        "optimizer": "adam",
        "initial_learning_rate": 0.0001,
        "learning_rate_schedule": {
            "reduce_on_plateau": true,
            "reduction_factor": 0.5,
            "plateau_min_delta": 1e-08,
            "plateau_patience": 5,
            "plateau_cooldown": 3,
            "min_learning_rate": 1e-08
        },
        "hard_keypoint_mining": {
            "online_mining": false,
            "hard_to_easy_ratio": 2.0,
            "min_hard_keypoints": 2,
            "max_hard_keypoints": null,
            "loss_scale": 5.0
        },
        "early_stopping": {
            "stop_training_on_plateau": true,
            "plateau_min_delta": 1e-08,
            "plateau_patience": 20
        }
    },
    "outputs": {
        "save_outputs": true,
        "run_name": "baseline.centroid",
        "run_name_prefix": "",
        "run_name_suffix": null,
        "runs_folder": "models",
        "tags": [],
        "save_visualizations": true,
        "delete_viz_images": true,
        "zip_outputs": false,
        "log_to_csv": true,
        "checkpointing": {
            "initial_model": false,
            "best_model": true,
            "every_epoch": false,
            "latest_model": false,
            "final_model": false
        },
        "tensorboard": {
            "write_logs": false,
            "loss_frequency": "epoch",
            "architecture_graph": false,
            "profile_graph": false,
            "visualizations": true
        },
        "zmq": {
            "subscribe_to_controller": false,
            "controller_address": "tcp://127.0.0.1:9000",
            "controller_polling_timeout": 10,
            "publish_updates": false,
            "publish_address": "tcp://127.0.0.1:9001"
        }
    },
    "name": "",
    "description": "",
    "sleap_version": "1.2.3",
    "filename": "/usr/local/lib/python3.7/dist-packages/sleap/training_profiles/baseline.centroid.json"
}
INFO:sleap.nn.training:
INFO:sleap.nn.training:Using GPU 0 for acceleration.
INFO:sleap.nn.training:Disabled GPU memory pre-allocation.
INFO:sleap.nn.training:System:
GPUs: 1/1 available
  Device: /physical_device:GPU:0
         Available: True
        Initalized: False
     Memory growth: True
INFO:sleap.nn.training:
INFO:sleap.nn.training:Initializing trainer...
INFO:sleap.nn.training:Loading training labels from: proyectoprueba.pkg.slp
INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1
INFO:sleap.nn.training:  Splits: Training = 427 / Validation = 48.
INFO:sleap.nn.training:Setting up for training...
INFO:sleap.nn.training:Setting up pipeline builders...
INFO:sleap.nn.training:Setting up model...
INFO:sleap.nn.training:Building test pipeline...
INFO:sleap.nn.training:Loaded test example. [2.705s]
INFO:sleap.nn.training:  Input shape: (112, 96, 1)
INFO:sleap.nn.training:Created Keras model.
INFO:sleap.nn.training:  Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False)
INFO:sleap.nn.training:  Max stride: 16
INFO:sleap.nn.training:  Parameters: 1,953,105
INFO:sleap.nn.training:  Heads: 
INFO:sleap.nn.training:    [0] = CentroidConfmapsHead(anchor_part=None, sigma=2.5, output_stride=2, loss_weight=1.0)
INFO:sleap.nn.training:  Outputs: 
INFO:sleap.nn.training:    [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 56, 48, 1), dtype=tf.float32, name=None), name='CentroidConfmapsHead/BiasAdd:0', description="created by layer 'CentroidConfmapsHead'")
INFO:sleap.nn.training:Setting up data pipelines...
INFO:sleap.nn.training:Training set: n = 427
INFO:sleap.nn.training:Validation set: n = 48
INFO:sleap.nn.training:Setting up optimization...
INFO:sleap.nn.training:  Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-08, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08)
INFO:sleap.nn.training:  Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=20)
INFO:sleap.nn.training:Setting up outputs...
INFO:sleap.nn.training:Created run path: models/baseline.centroid_3
INFO:sleap.nn.training:Setting up visualization...
2022-05-25 09:13:39.177656: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "Tesla K80" frequency: 823 num_cores: 13 environment { key: "architecture" value: "3.7" } environment { key: "cuda" value: "11010" } environment { key: "cudnn" value: "8005" } num_registers: 131072 l1_cache_size: 16384 l2_cache_size: 1572864 shared_memory_size_per_multiprocessor: 114688 memory_size: 11320098816 bandwidth: 240480000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } }
2022-05-25 09:13:40.755010: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "Tesla K80" frequency: 823 num_cores: 13 environment { key: "architecture" value: "3.7" } environment { key: "cuda" value: "11010" } environment { key: "cudnn" value: "8005" } num_registers: 131072 l1_cache_size: 16384 l2_cache_size: 1572864 shared_memory_size_per_multiprocessor: 114688 memory_size: 11320098816 bandwidth: 240480000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } }
Unable to use Qt backend for matplotlib. This probably means Qt is running headless.
INFO:sleap.nn.training:Finished trainer set up. [7.4s]
INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
Traceback (most recent call last):
  File "/usr/local/bin/sleap-train", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/sleap/nn/training.py", line 1947, in main
    trainer.train()
  File "/usr/local/lib/python3.7/dist-packages/sleap/nn/training.py", line 910, in train
    training_ds = self.training_pipeline.make_dataset()
  File "/usr/local/lib/python3.7/dist-packages/sleap/nn/data/pipelines.py", line 287, in make_dataset
    ds = transformer.transform_dataset(ds)
  File "/usr/local/lib/python3.7/dist-packages/sleap/nn/data/dataset_ops.py", line 318, in transform_dataset
    self.examples = list(iter(ds))
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 836, in __next__
    return self._next_internal()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 822, in _next_internal
    output_shapes=self._flat_output_shapes)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2923, in iterator_get_next
    _ops.raise_from_not_ok_status(e, name)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py", line 7186, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape of tensor EagerPyFunc [180,164,3] is not compatible with expected shape [?,?,1].
	 [[{{node EnsureShape}}]] [Op:IteratorGetNext]
2022-05-25 09:13:41.163947: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

isabelperezf avatar May 25 '22 09:05 isabelperezf

Hi @isabelperezf, @auesro,

Do you mind sharing the exported data with me? The problem seems to occur when we call tensorflow's iter() function during the creation of the training dataset. Tensorflow then raises an error due to a shape discrepancy in one of the callbacks (Shape of tensor EagerPyFunc [180,164,3] is not compatible with expected shape [?,?,1]). I want to inspect the data (and check if it runs locally without error).

Thanks, Liezl

roomrys avatar May 26 '22 17:05 roomrys

Hi @roomrys, I´ve shared you my exported data. Thanks!

isabelperezf avatar May 27 '22 08:05 isabelperezf

Hi @isabelperezf, @auesro,

It seems that the training error is due to a mix of color and grayscale videos in the project. The network architecture used in training requires the project to either have all videos in grayscale or all videos in color - this is a bug:

To be clear: SLEAP should be able to handle a mix of videos imported as RGB and grayscale by converting all of them to one or the other.

I am currently working on a feature that will allow users to convert their videos from color to grayscale within the GUI, but, until that feature is available, you will need to either convert your own videos to grayscale using a python script or an online converter.

Thanks, Liezl

roomrys avatar Jun 06 '22 23:06 roomrys

Problem Analysis

When attempting to train a SLEAP project which has a mix of color and grayscale videos, we get an error from tensorflow: Shape of tensor EagerPyFunc [180,164,3] is not compatible with expected shape [?,?,1]. It seems that the "Convert Image To:" GUI option will create a training config that lists the correct RGB/grayscale option, and is then added to the Training Pipeline as expected. However, when creating and then transforming the dataset in Trainer.train() we have trouble with the first transformer in our pipeline (Preloader(examples[])) - before we even get to try to convert the dataset to all RGB/Grayscale - due to the mix of RGB/grayscale that already exists in ds = self.providers[0].make_dataset().

Proposed Solutions

Proof of Concept

If we change both the following line from LabelsReader.make_dataset():

first_image = tf.convert_to_tensor(self.labels[0].image)

to only allow a single channel first_image = tf.convert_to_tensor(self.labels[0].image[..., 0][..., None]) and the line from LabelsReader.make_dataset().py_fetch_lf():

raw_image = lf.image

to only allow a single channel raw_image = lf.image[..., 0][..., None] then the training pipeline proceeds without error.

Proposals

  1. Gather the dataset as all RGB/grayscale in LabelsReader depending on the configuration set with the "Convert Image To:" GUI option.

Questions/Verifications

  • Does network performance have any correlation to number of image/video channels? If the network always performs better on grayscale/RGB, then maybe we should hardcode that into our pipeline.
  • What is "Convert Image To:" doing (or not doing)?

Tests

  • [ ] Mixed color/grayscale projects successfully train.

Relevant functions/files

Training Pipeline GUI

Video Backend

Training Pipeline Backend

roomrys avatar Jun 07 '22 00:06 roomrys

Hi @roomrys

Thanks a lot for your help.

Then it might have been our mistake: we thought when ticking the Grayscale option in the GUI the conversion would be done by SLEAP...however it sounds like it is only for visualization purposes, right?

However, I am not sure we have a mix of color and grayscale videos...all of them were recorded with the same camera, all should be in RGB format, is it possible that the mistake was to tick the Grayscale option in the GUI for some videos and not for others?

Cheers, A

auesro avatar Jun 07 '22 15:06 auesro

Just want to chime in here to mention that this was addressed for inference in #639. It's possible that we're running into the same problem during training.

To be clear: SLEAP should be able to handle a mix of videos imported as RGB and grayscale by converting all of them to one or the other.

talmo avatar Jun 07 '22 16:06 talmo

Hi @auesro,

Taking a look at the Video dialog box in the GUI, you can verify how many channels your videos have.

image Fig: The Video Dialog in the SLEAP GUI will display the number of channels each video has.

Since all videos were recorded on the same camera, it could very well be that grayscale option (when adding videos to the project) was checked for some videos, but not others.

Thanks, Liezl

roomrys avatar Jun 08 '22 00:06 roomrys

@talmo It seems the issue with the rgb/grayscale persists in inference as well (namely if we use a LabelsReader provider to create the dataset - which occurs when we "Predict On:" either "user labeled frames" or "suggested frames" per the logic in LearningDialog.get_items_for_inference()). If we choose any other option for "Predict On:", then we get no predictions on the last video in the list? (whether it follows the number of channels in the other videos or not), but also no errors - likely due to using a VideoReader provider.

To recreate:

  1. Open a project with a mix of rgb/grayscale videos
  2. Add user-labeled / suggested frames in both colored and grayscale videos
  3. Predict On: user labeled / suggested frames
  4. Observe the error below:
System:
GPUs: 1/1 available
  Device: /physical_device:GPU:0
         Available: True
        Initalized: False
     Memory growth: True

Traceback (most recent call last):
  File "C:\Users\TalmoLab\miniconda3\envs\sleap_06-08-22\Scripts\sleap-track-script.py", line 33, in <module>
    sys.exit(load_entry_point('sleap', 'console_scripts', 'sleap-track')())
  File "d:\social-leap-estimates-animal-poses\other\sleap-06-08-22\sleap\sleap\nn\inference.py", line 4295, in main
    labels_pr = predictor.predict(provider)
  File "d:\social-leap-estimates-animal-poses\other\sleap-06-08-22\sleap\sleap\nn\inference.py", line 436, in predict       
    self._make_labeled_frames_from_generator(generator, data) 
  File "d:\social-leap-estimates-animal-poses\other\sleap-06-08-22\sleap\sleap\nn\inference.py", line 2126, in _make_labeled_frames_from_generator
    for ex in generator:
  File "d:\social-leap-estimates-animal-poses\other\sleap-06-08-22\sleap\sleap\nn\inference.py", line 365, in _predict_generator
    for ex in self.pipeline.make_dataset():
  File "C:\Users\TalmoLab\miniconda3\envs\sleap_06-08-22\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 836, in __next__
    return self._next_internal()
  File "C:\Users\TalmoLab\miniconda3\envs\sleap_06-08-22\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 822, in _next_internal
    output_shapes=self._flat_output_shapes)
  File "C:\Users\TalmoLab\miniconda3\envs\sleap_06-08-22\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 2922, in iterator_get_next
    _ops.raise_from_not_ok_status(e, name)
  File "C:\Users\TalmoLab\miniconda3\envs\sleap_06-08-22\lib\site-packages\tensorflow\python\framework\ops.py", line 7186, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape of tensor EagerPyFunc [720,1280,1] is not compatible with expected shape [720,1280,3].
         [[{{node EnsureShape}}]] [Op:IteratorGetNext]

Process return code: 1

Both errors point to a problem with how we create the dataset in Pipeline.make_dataset() (and subsequently LabelsReader.make_dataset()). The error then occurs when we attempt to apply the first transformer.

roomrys avatar Jun 09 '22 01:06 roomrys