supervision Issue with sv.VideoInfo FPS Handling for Precise Video Metadata Retrieval

Search before asking

[X] I have searched the Supervision issues and found no similar feature requests.

Description

Hello,

I'm using sv.VideoInfo.from_video_path to retrieve video metadata and perform video manipulations. However, I've noticed an issue with how fps is calculated. Specifically, sv.VideoInfo.from_video_path uses the following method: fps = int(video.get(cv2.CAP_PROP_FPS))

This approach can lead to inaccuracies in certain scenarios. For example, if the actual FPS of a video is 24.999, the method will round this down to 24. Over a long video, this discrepancy can cause significant shifts and synchronization problems.

Would it be possible to modify the implementation to return the FPS as a float rather than truncating it to an integer? This would improve accuracy for edge cases like this.

Thank you for your attention!

Use case

Change the method of fps from fps = int(video.get(cv2.CAP_PROP_FPS)) to fps = float(video.get(cv2.CAP_PROP_FPS))

Additional

No response

Are you willing to submit a PR?

[X] Yes I'd like to help by submitting a PR!

Nov 25 '24 08:11 joesu-angible

Hi @joesu-angible 👋

Thank you for reporting the issue! Indeed, half of our own asset videos don't have a round number for the fps.

Test code

import supervision as sv
from supervision.assets import VideoAssets, download_assets
import cv2

# int fps:   People walking, Market square, Skiing, Milk bottling plant, Vehicles
# float fps: Subway, Basketball, Grocery Store, Beach, Vehicles 2

asset = VideoAssets.VEHICLES_2
download_assets(asset)

video = cv2.VideoCapture(asset.value)

fps = video.get(cv2.CAP_PROP_FPS)
print(fps, type(fps))

video_info = sv.VideoInfo.from_video_path(asset.value)
print(video_info)

This would be a superb PR submission! However, since we've had the int(fps) code for a while, this issue requires looking at the wider impacts. There's multiple things to check, but I believe each of these is pretty simple:

[ ] Can we still use VideoSink to store a video?
[ ] Does ByteTrack still work if a non-int FPS is passed in? Does setting track_seconds * fps for the lost track buffer work?
[ ] Does speed estimation in inference_example.py work?
[ ] It's also used in coordinate calculation in {ultralytics,inference,yolo_nas}_example.py
[ ] timers.py for time in zone calculation. Used in ultralytics{_naive}_stream_example.py
[ ] Is FPS monitor anyhow affected? I don't think so, but it'd be worth checking.

Does that sound interesting, @joesu-angible?

Nov 25 '24 10:11 LinasKo

@LinasKo I have been testing this.

I ran the sample code for VideoSink for both the cases, fps being int and float.

While the output videos are visually the same, there are minor differences in some parameters.

Feb 14 '25 19:02 miteshashar

Hi @miteshashar thanks for running the tests. I think the most noticeable difference shouldn't be in the file size but in the video length. Did you take such measurements as well?

Feb 18 '25 13:02 SkalskiP

Hi @SkalskiP, I did some refactoring over yesterday and today and have finished running tests for [coreml/cpu] X [float/int] X [video_sink/byte_track].

I will further work around getting the durations. That should not be too much of a mod. I am expecting to get back with these by tomorrow.

Feb 18 '25 13:02 miteshashar

@SkalskiP So, there is a difference in the duration.

Obtained using ffbprobe, since total_frames / fps would naturally have a difference.

Feb 18 '25 16:02 miteshashar

I just realised that changing it to float has a desirable effect. The duration of the output videos when VideoInfo.fps is float is closer to that of the original video in most of the cases.

Feb 18 '25 16:02 miteshashar

Below are updates based on all the tests I have run until now.

[x] Tests All tests are passing after changing VideoInfo.fps to float.
[x] Can we still use VideoSink to store a video? Looks OK to me, since the duration of the output files is the same as the original file.

Also, the per-frame comparison for milk-bottling-plant.mp4 that was failing earlier must have been the result of some oversight. I regenerated the files and the frames match now.

[x] Does ByteTrack still work if a non-int FPS is passed in? For all the example videos, I compared the complete output trails for int & float, per-frame for the detected class_name, confidence and xyxy of all objects.

Code

These is a part of the code I have written that covers dumping the trail and comparing two trails.

def test_byte_tracker_for_asset(asset: VideoAssets, mode: str = MODEL_MODE):
    """Test ByteTracker for given asset and mode.

    Args:
        asset: Video asset to test
        mode: Model mode to use
    """
    model = get_model(mode)
    video_info = sv.VideoInfo.from_video_path(video_path=asset.value)
    frame_generator = sv.get_video_frames_generator(source_path=asset.value)
    tracker = sv.ByteTrack(frame_rate=video_info.fps, )

    target_path = TestCase(BYTE_TRACKER_TEST_CLASS, prefix=mode).path(asset)

    final_detections = []
    for frame in frame_generator:
        result = model(frame)[0]
        detections = sv.Detections.from_ultralytics(result)
        detections = tracker.update_with_detections(detections)
        detection_labels = []
        if detections:
            detection_labels = [{
                "class_name": str(class_name),
                "confidence": round(float(confidence), 2),
                "xyxy": xyxy.tolist(),
            } for class_name, confidence, xyxy in zip(detections["class_name"], detections.confidence, detections.xyxy)]
        final_detections.append(detection_labels)
    with open(target_path.with_suffix('.yaml'), 'w', encoding="utf-8") as f:
        yaml.dump(final_detections, f)
def compare_bytetrack_yaml_for_asset(asset: VideoAssets, a: TestCase, b: TestCase):
    """Compare ByteTrack YAML outputs between two test cases for given asset.

    Args:
        asset: Video asset to compare
        a: First test case
        b: Second test case
    """
    print(f"Comparing ByteTrack YAML outputs for \"{asset.name}\" between:")
    print(f"{a} & {b}")
    a_path = a.path(asset)
    b_path = b.path(asset)

    assert a_path.exists() and b_path.exists()

    with open(a_path, "r", encoding="utf-8") as f:
        a_data = yaml.load(f, Loader=yaml.Loader)
    with open(b_path, "r", encoding="utf-8") as f:
        b_data = yaml.load(f, Loader=yaml.Loader)

    diff = []
    mismatch_counter = {
        "frames": 0,
        "object": 0,
        "confidence": 0,
        "xyxy": 0,
    }
    for a_frame, b_frame in zip(a_data, b_data):
        mismatches = {
            "object": abs(len(a_frame) - len(b_frame)),
            "confidence": 0,
            "xyxy": 0,
        }
        for idx in range(min(len(a_frame), len(b_frame))):
            a_frame_object = a_frame[idx]
            b_frame_object = b_frame[idx]
            try:
                if a_frame_object["class_name"] != b_frame_object["class_name"]:
                    mismatches["object"] += 1
                    mismatch_counter["object"] += 1
                if a_frame_object["confidence"] != b_frame_object["confidence"]:
                    mismatches["confidence"] += 1
                    mismatch_counter["confidence"] += 1
                if not (np.array(a_frame_object["xyxy"]) == np.array(b_frame_object["xyxy"])).all():
                    mismatches["xyxy"] += 1
                    mismatch_counter["xyxy"] += 1
            except Exception as e:
                print(f"a_frame_object: {a_frame_object}")
                print(f"b_frame_object: {b_frame_object}")
                raise e
        diff.append(mismatches)
        if mismatches["object"] or mismatches["confidence"] or mismatches["xyxy"]:
            mismatch_counter["frames"] += 1
    if mismatch_counter["frames"]:
        print(f"Found mismatches for total {mismatch_counter['frames']} frames for {asset.name} between {a} and {b}")
        print(f"Total Object mismatches: {mismatch_counter['object']}")
        print(f"Total Confidence mismatches: {mismatch_counter['confidence']}")
        print(f"Total XYXY mismatches: {mismatch_counter['xyxy']}")
    else:
        print(f"No mismatches found for {asset.name} between {a} and {b}")

    # Write mismatches to a YAML file
    stem = a_path.stem.replace(f"_{a.fps_type}", "_diff")
    target_path = a_path.parent / f"{stem}.yaml"
    with open(target_path, "w", encoding="utf-8") as f:
        yaml.dump(diff, f)

There are zero mismatches detected.

[ ] Does speed estimation in inference_example.py work?
[ ] It's also used in coordinate calculation in {ultralytics,inference,yolo_nas}_example.py I tested with examples/speed_estimation/ultralytics_example.py with the yolo8x.mlpackage model. Since the example uses the value of fps to create a deque, it fails with the current code when fps is of float type. I changed the deque declaration to coordinates = defaultdict(lambda: deque(maxlen=int(video_info.fps))). Compared the output by comparing the calculated speed for both the cases. There was no difference in the speeds for vehicles.mp4. But there were a lot of mismatches for the speeds for vehicles-2.mp4, which I now realize could be because of:
```
SOURCE = np.array([[1252, 787], [2298, 803], [5039, 2159], [-550, 2159]])
TARGET_WIDTH = 25
TARGET_HEIGHT = 250
```

The following tests now remain.

[ ] Does setting track_seconds * fps for the lost track buffer work? I will run the example in heatmap_and_track/scripts.py and validate it.
[ ] timers.py for time in zone calculation. Used in ultralytics{_naive}_stream_example.py
[ ] Is FPS monitor anyhow affected? I don't think so, but it'd be worth checking.

Feb 24 '25 12:02 miteshashar

supervision supervision copied to clipboard

Issue with sv.VideoInfo FPS Handling for Precise Video Metadata Retrieval

Search before asking

Description

Use case

Additional

Are you willing to submit a PR?

supervision
supervision copied to clipboard