[Bug] VRAM is not released when using multiple model

Open hebangwen opened this issue 7 months ago • 1 comments

Hi, thanks for your contribution on building this evluation kit. I used it for reproducing Qwen2.5-VL-3B-Instruct and Qwen2.5-VL-7B-Instruct, recently. I construct a model config with this two model and one dataset, which is shown below. However, the VRAM allocated from previous Qwen2.5-VL-3B-Instruct model seems not released, after this model is done. As we can see, the VRAM is nearly the sum of 3B model and 7B model.

config:

{
    "model": {
        "Qwen2.5-VL-3B-Instruct-edge": {
            "class": "Qwen2VLChat",
            "model_path": "Qwen/Qwen2.5-VL-3B-Instruct",
            "min_pixels": 3136,
            "max_pixels": 802816,
            "use_custom_prompt": false
        },
        "Qwen2.5-VL-7B-Instruct-edge": {
            "class": "Qwen2VLChat",
            "model_path": "Qwen/Qwen2.5-VL-7B-Instruct",
            "min_pixels": 3136,
            "max_pixels": 802816,
            "use_custom_prompt": false
        }
    },
    "data": {
        "MMMU_DEV_VAL": {
            "class": "MMMUDataset",
            "dataset": "MMMU_DEV_VAL"
        }
    }
}

nvidia-smi:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05              Driver Version: 560.35.05      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090 D      Off |   00000000:01:00.0 Off |                  Off |
| 37%   62C    P2            286W /  425W |   23388MiB /  24564MiB |     96%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A   3805914      C   python                                      23378MiB |
+-----------------------------------------------------------------------------------------+

Jun 12 '25 04:06 hebangwen

Hi, @hebangwen , Thanks for point out, we will see how to fix the problem. For now, a simple workaround is to write a for loop in bash and evaluate a single model each time.

Jun 16 '25 11:06 kennymckormick