the first sleep call does not release memory when initialized

Open llan-ml opened this issue 1 month ago • 0 comments

env:

ms-swift==3.10.0
transformers==4.57.1

I add a print after sleep here

...
            context = self.offload_context if self.enable_offload else nullcontext

            with context():
                self.engine = self._prepare_vllm_engine()
                if args.sleep_level > 0:
                    time.sleep(5)
                    self.engine.engine.reset_prefix_cache()
                    self.engine.engine.sleep(args.sleep_level)
                    print("=====================" * 12, "first sleep", args.sleep_level)
        self.dynamic_num_samples = False  # grpo multi-turn
...

The log is as follows:

INFO 11-17 17:06:50 [worker_v1.py:171] Sleep mode freed 0.00 GiB memory, 23.66 GiB memory is still in use.
INFO 11-17 17:06:50 [executor_base.py:189] It took 0.000855 seconds to fall asleep.
============================================================================================================================================================================================================================================================ first sleep 1
INFO 11-17 17:06:50 [block_pool.py:378] Successfully reset prefix cache
INFO 11-17 17:06:50 [worker_v1.py:171] Sleep mode freed 0.00 GiB memory, 23.66 GiB memory is still in use.
INFO 11-17 17:06:50 [executor_base.py:189] It took 0.000849 seconds to fall asleep.
============================================================================================================================================================================================================================================================ first sleep 1
INFO 11-17 17:06:50 [block_pool.py:378] Successfully reset prefix cache
INFO 11-17 17:06:50 [worker_v1.py:171] Sleep mode freed 0.00 GiB memory, 23.66 GiB memory is still in use.
INFO 11-17 17:06:50 [executor_base.py:189] It took 0.000957 seconds to fall asleep.
============================================================================================================================================================================================================================================================ first sleep 1
INFO 11-17 17:06:50 [block_pool.py:378] Successfully reset prefix cache
INFO 11-17 17:06:50 [worker_v1.py:171] Sleep mode freed 0.00 GiB memory, 23.66 GiB memory is still in use.
INFO 11-17 17:06:50 [executor_base.py:189] It took 0.000891 seconds to fall asleep.
============================================================================================================================================================================================================================================================ first sleep 1
INFO 11-17 17:06:50 [block_pool.py:378] Successfully reset prefix cache
INFO 11-17 17:06:50 [worker_v1.py:171] Sleep mode freed 0.00 GiB memory, 23.66 GiB memory is still in use.
INFO 11-17 17:06:50 [executor_base.py:189] It took 0.000896 seconds to fall asleep.
============================================================================================================================================================================================================================================================ first sleep 1
INFO 11-17 17:06:50 [block_pool.py:378] Successfully reset prefix cache
INFO 11-17 17:06:50 [worker_v1.py:171] Sleep mode freed 0.00 GiB memory, 23.66 GiB memory is still in use.
INFO 11-17 17:06:50 [executor_base.py:189] It took 0.000873 seconds to fall asleep.
============================================================================================================================================================================================================================================================ first sleep 1
INFO 11-17 17:06:50 [block_pool.py:378] Successfully reset prefix cache
INFO 11-17 17:06:50 [worker_v1.py:171] Sleep mode freed 0.00 GiB memory, 23.66 GiB memory is still in use.
INFO 11-17 17:06:50 [executor_base.py:189] It took 0.000895 seconds to fall asleep.
============================================================================================================================================================================================================================================================ first sleep 1

The script to reproduce:

export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export IMAGE_MAX_TOKEN_NUM=1024
export USE_OPTIMIZED_MODEL=0
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export NPROC_PER_NODE=8

swift rlhf \
  --rlhf_type grpo \
  --model /cache/Qwen3-VL-8B-Instruct/ \
  --gradient_checkpointing_kwargs '{"use_reentrant": false}' \
  --train_type lora \
  --use_vllm true \
  --vllm_mode colocate \
  --vllm_gpu_memory_utilization 0.8 \
  --vllm_max_model_len 4096 \
  --vllm_tensor_parallel_size 8 \
  --vllm_enable_prefix_caching false \
  --offload_optimizer true \
  --offload_model true \
  --sleep_level 1 \
  --dataset lmms-lab/multimodal-open-r1-8k-verified#1000 \
  --load_from_cache_file true \
  --external_plugins examples/train/grpo/plugin/plugin.py \
  --reward_funcs external_r1v_acc format \
  --reward_weights 1 0.1 \
  --torch_dtype bfloat16 \
  --attn_impl flash_attn \
  --num_train_epochs 1 \
  --lora_rank 64 \
  --lora_alpha 128 \
  --max_length 4096 \
  --per_device_train_batch_size 1 \
  --per_device_eval_batch_size 1 \
  --gradient_accumulation_steps 8 \
  --eval_steps 500 \
  --save_steps 500 \
  --learning_rate 5e-6 \
  --save_total_limit 2 \
  --logging_steps 1 \
  --warmup_ratio 0.0 \
  --dataloader_num_workers 4 \
  --max_completion_length 512 \
  --num_generations 16 \
  --steps_per_generation 32 \
  --deepspeed zero2 \
  --temperature 1.1 \
  --top_p 1.0 \
  --top_k 80 \
  --log_completions false \
  --async_generate false \
  --system examples/train/grpo/prompt.txt \
  --beta 0.001

Nov 17 '25 09:11 llan-ml