Not all memory released in video predictions

Open JoeGaffney opened this issue 1 month ago • 2 comments

Potential GPU memory leak when reusing `Sam3VideoPredictorMultiGPU`

Description

Hey, GPU memory seems to double when creating multiple Sam3VideoPredictorMultiGPU instances, even after calling shutdown(). If only loading the predictor multiple times and clearing it is correct. But if the video is processed the memory is retained.

Any insights would be appreciated.

Reproduce

def test_memory_leak():
    import torch
    from huggingface_hub import hf_hub_download
    from sam3.model_builder import build_sam3_video_predictor

    video_path = "../assets/act_reference_v001.mp4"

    def get_bpe_vocab_path() -> str:
        """seems missing in the sam3 package, so we add it here"""
        return hf_hub_download(
            repo_id="LanguageBind/LanguageBind", filename="open_clip/bpe_simple_vocab_16e6.txt.gz", repo_type="space"
        )

    # First run WITH video processing
    predictor1 = build_sam3_video_predictor(bpe_path=get_bpe_vocab_path())
    response = predictor1.handle_request(dict(type="start_session", resource_path=video_path))
    session_id = response["session_id"]
    predictor1.handle_request(dict(type="add_prompt", session_id=session_id, frame_index=0, text="person"))
    predictor1.handle_request(dict(type="close_session", session_id=session_id))
    predictor1.shutdown()
    del predictor1
    torch.cuda.empty_cache()

    print(f"After first predictor: {torch.cuda.memory_allocated() / 1024**3:.2f} GiB")

    # Second run
    predictor2 = build_sam3_video_predictor(bpe_path=get_bpe_vocab_path())
    response = predictor2.handle_request(dict(type="start_session", resource_path=video_path))
    session_id = response["session_id"]
    predictor2.handle_request(dict(type="add_prompt", session_id=session_id, frame_index=0, text="person"))
    predictor2.handle_request(dict(type="close_session", session_id=session_id))
    predictor2.shutdown()
    del predictor2
    torch.cuda.empty_cache()

    print(f"After second predictor: {torch.cuda.memory_allocated() / 1024**3:.2f} GiB")

Result

Memory doubles (1.52 → 3.04 GiB) instead of being freed after shutdown().

docker compose exec gpu-workers pytest tests/videos/local/test_sam_3_video.py -vs
================================================================================================================================== test session starts ==================================================================================================================================platform linux -- Python 3.11.12, pytest-9.0.1, pluggy-1.5.0 -- /opt/conda/bin/python3.11
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/app/workers/.hypothesis/examples'))
rootdir: /app/workers
configfile: pytest.ini
plugins: hypothesis-6.131.7, anyio-4.12.0, hydra-core-1.3.2
collected 1 item

tests/videos/local/test_sam_3_video.py::test_memory_leak INFO 2025-12-05 13:51:37,744 7 sam3_video_predictor.py: 299: using the following GPU IDs: [0]
INFO 2025-12-05 13:51:37,744 7 sam3_video_predictor.py: 315:


        *** START loading model on all ranks ***


INFO 2025-12-05 13:51:37,745 7 sam3_video_predictor.py: 317: loading model on rank=0 with world_size=1 -- this could take a while ...
INFO 2025-12-05 13:51:45,476 7 sam3_video_base.py: 124: setting max_num_objects=10000 and num_obj_for_compile=16
INFO 2025-12-05 13:51:48,583 7 sam3_video_predictor.py: 319: loading model on rank=0 with world_size=1 -- DONE locally
INFO 2025-12-05 13:51:48,583 7 sam3_video_predictor.py: 330:


        *** DONE loading model on all ranks ***


frame loading (OpenCV) [rank=0]: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 121/121 [00:00<00:00, 509.64it/s]INFO 2025-12-05 13:51:57,976 7 sam3_video_predictor.py: 250: removed session c6a21b18-9143-46cb-8d04-38e74855da92; live sessions: [], GPU memory: 5108 MiB used and 7604 MiB reserved (max over time: 7207 MiB used and 7604 MiB reserved)
After first predictor: 1.52 GiB
INFO 2025-12-05 13:51:58,206 7 sam3_video_predictor.py: 299: using the following GPU IDs: [0]
INFO 2025-12-05 13:51:58,206 7 sam3_video_predictor.py: 315:


        *** START loading model on all ranks ***


INFO 2025-12-05 13:51:58,207 7 sam3_video_predictor.py: 317: loading model on rank=0 with world_size=1 -- this could take a while ...
INFO 2025-12-05 13:52:05,620 7 sam3_video_base.py: 124: setting max_num_objects=10000 and num_obj_for_compile=16
INFO 2025-12-05 13:52:08,733 7 sam3_video_predictor.py: 319: loading model on rank=0 with world_size=1 -- DONE locally
INFO 2025-12-05 13:52:08,734 7 sam3_video_predictor.py: 330:


        *** DONE loading model on all ranks ***


frame loading (OpenCV) [rank=0]: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 121/121 [00:00<00:00, 563.67it/s]INFO 2025-12-05 13:52:09,973 7 sam3_video_predictor.py: 250: removed session 59ddc39d-2c84-428e-a730-d5628da67de5; live sessions: [], GPU memory: 6660 MiB used and 9044 MiB reserved (max over time: 8762 MiB used and 9044 MiB reserved)
After second predictor: 3.04 GiB
PASSED2025-12-05 13:52:10,427 - WARNING - Cleaned GPU memory: GPU Memory Usage: 5.33GiB / 31.84GiB,  Reserved: 3.51GiB, Allocated: 3.04GiB, Usage: 16.73%, Allocated Percent: 9.53%

-------------------------------------------------------------------------------------------------------------------------------- live log sessionfinish ---------------------------------------------------------------------------------------------------------------------------------WARNING  common.logger:memory.py:57 Cleaned GPU memory: GPU Memory Usage: 5.33GiB / 31.84GiB,  Reserved: 3.51GiB, Allocated: 3.04GiB, Usage: 16.73%, Allocated Percent: 9.53%


=================================================================================================================================== slowest durations ===================================================================================================================================33.24s call     tests/videos/local/test_sam_3_video.py::test_memory_leak

(2 durations < 1s hidden.)
================================================================================================================================== 1 passed in 36.59s =============================================================================================================================

Dec 05 '25 13:12 JoeGaffney

Not all memory released in video predictions

Potential GPU memory leak when reusing Sam3VideoPredictorMultiGPU

Description

Reproduce

Result

Potential GPU memory leak when reusing `Sam3VideoPredictorMultiGPU`