Prerequisites

Please answer the following questions for yourself before submitting an issue.

[ *] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
[ *] I am reporting the issue to the correct repository. (Model Garden official or research directory)
[ *] I checked to make sure that this issue has not been filed already.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/official/projects/movinet

2. Describe the bug

A minimal code example will follow the bug description.

Initialized a pretrained MoViNet A0 Stream model from hub: https://tfhub.dev/tensorflow/movinet/a0/stream/kinetics-600/classification/
Initialized a pretrained model MoViNet A0 from checkpoint: https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a0_stream.tar.gz

The logit outputs differ considerably between the two models. I have validated that the model weights in the hub and the checkpoint are the same.

3. Steps to reproduce

Before running the unittest, download and extract the checkpoint:

wget https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a0_stream.tar.gz -O movinet_a0_stream_.tar.gz -q
tar -xvf movinet_a0_stream_.tar.gz

import unittest
from typing import Tuple, Dict
import tensorflow_hub as hub
import tensorflow as tf
from six.moves import urllib
from io import BytesIO
from PIL import Image
from official.projects.movinet.modeling import movinet
from official.projects.movinet.modeling import movinet_model
import numpy as np

model_id = 'a0'
num_classes = 600
H = W = 172
C = 3
T = 1
bs = 1
dummy_input = tf.random.normal(shape=[bs, T, H, W, 3])


def create_hub_model(model_id) -> Tuple[tf.keras.Model, Dict]:
    hub_url = f"https://tfhub.dev/tensorflow/movinet/{model_id}/stream/kinetics-600/classification/"
    model_hub = hub.KerasLayer(hub_url)
    init_states_fn = model_hub.resolved_object.signatures['init_states']
    init_states = init_states_fn(tf.shape(dummy_input))
    return model_hub, init_states


def create_local(model_id) -> Tuple[movinet.Movinet, Dict]:
    backbone = movinet.Movinet(
        model_id=model_id,
        causal=True,
        conv_type='2plus1d',
        se_type='2plus3d',
        activation='hard_swish',
        gating_activation='hard_sigmoid',
        use_positional_encoding=False,
        use_external_states=True,
    )
    backbone.trainable = False
    model = movinet_model.MovinetClassifier(
        backbone,
        num_classes=600,
        output_states=True
    )
    checkpoint_dir = f'movinet_{model_id}_stream'
    checkpoint_path = tf.train.latest_checkpoint(checkpoint_dir)
    checkpoint = tf.train.Checkpoint(model=model)
    status = checkpoint.restore(checkpoint_path).expect_partial()
    status.assert_existing_objects_matched()
    init_states_local = model.init_states(tf.shape(dummy_input))
    return model, init_states_local


class MyTestCase(unittest.TestCase):

    def test_hub_equal_source(self):
        model_hub, states_hub = create_hub_model(model_id)
        image_url = 'https://upload.wikimedia.org/wikipedia/commons/8/84/Ski_Famille_-_Family_Ski_Holidays.jpg'
        with urllib.request.urlopen(image_url) as f:
            image = Image.open(BytesIO(f.read())).resize((H, W))
        X = tf.reshape(np.array(image), [1, 1, H, W, 3])
        X = tf.cast(X, tf.float32) / 255
        y_hub, _ = model_hub({**states_hub, 'image': X})
        print(y_hub[0][0:5])
        model_local, states_local = create_local(model_id)
        y_local, _ = model_local({**states_local, 'image': X})
        print(y_local[0][0:5])
        tf.debugging.assert_near(y_local, y_hub, atol=1e-3)


if __name__ == '__main__':
    unittest.main()

4. Expected behavior

The output logits of the hub model and the checkpoint model should be close. However, they differ considerably.

5. Additional context

Dependencies for the test: numpy Pillow==11.1.0 six==1.17.0 tensorflow[and-cuda]==2.18.1 tensorflow_hub==0.16.1 tf_models_official==2.18.00

6. System information

OS Platform and Distribution - Ubuntu 22.04.5 LTS
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below):=2.18.1
Python version: 3.10.12
CUDA/cuDNN version: cuda_12.8.r12.8
GPU model and memory: NVIDIA GeForce RTX 4090, 24GB

Mar 31 '25 14:03 gilgoldm

Below are the two key changes needed:

Enable Positional Encoding

Original:

use_positional_encoding=False

Update to:

use_positional_encoding=True

Add Dropout and Drop Connect Rates These parameters are used in the TFHub model and must be explicitly set in the local model to ensure parity.

Update classifier config:

dropout_rate=0.2

Update backbone config:

drop_connect_rate=0.2

Apr 11 '25 05:04 Jiya873

@Jiya873 Thank you for the replay.

drop_connect_rate is not an input parameter to movinet.Movinet. Did you mean stochastic_depth_drop_rate?
Regarding use_positional_encoding, when this parameter is set to True, the checkpoint and the model become incompatible. Note that the test runs the a0 stream model. The following exception is thrown on status.assert_existing_objects_matched():

AssertionError: Found 15 Python objects that were not bound to checkpointed values, likely due to changes in the Python program. Showing 10 of 15 unmatched objects: [<tf.Variable 'scale:0' shape=() dtype=float32, numpy=0.0>, <tf.Variable 'scale:0' shape=() dtype=float32, numpy=0.0>, <tf.Variable 'scale:0' shape=() dtype=float32, numpy=0.0>, <tf.Variable 'scale:0' shape=() dtype=float32, numpy=0.0>, <tf.Variable 'scale:0' shape=() dtype=float32, numpy=0.0>, <tf.Variable 'scale:0' shape=() dtype=float32, numpy=0.0>, <tf.Variable 'scale:0' shape=() dtype=float32, numpy=0.0>, <tf.Variable 'scale:0' shape=() dtype=float32, numpy=0.0>, <tf.Variable 'scale:0' shape=() dtype=float32, numpy=0.0>, <tf.Variable 'scale:0' shape=() dtype=float32, numpy=0.0>]

This aligns with the paper and documentation, which state that positional encodings are used in streaming models from model a3 and above, inclusive. The test used a0.

I've tested the suggested changes on the a3 stream version as well, using the same test, and no AssertionError is thrown on assert_existing_objects_matched. However, the output of the hub and source model still does not match.

Apr 11 '25 11:04 gilgoldm

Movinet hub and source output differ

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behavior

5. Additional context

6. System information