ms-swift
ms-swift copied to clipboard
the first sleep call does not release memory when initialized
env:
ms-swift==3.10.0
transformers==4.57.1
I add a print after sleep here
...
context = self.offload_context if self.enable_offload else nullcontext
with context():
self.engine = self._prepare_vllm_engine()
if args.sleep_level > 0:
time.sleep(5)
self.engine.engine.reset_prefix_cache()
self.engine.engine.sleep(args.sleep_level)
print("=====================" * 12, "first sleep", args.sleep_level)
self.dynamic_num_samples = False # grpo multi-turn
...
The log is as follows:
INFO 11-17 17:06:50 [worker_v1.py:171] Sleep mode freed 0.00 GiB memory, 23.66 GiB memory is still in use.
INFO 11-17 17:06:50 [executor_base.py:189] It took 0.000855 seconds to fall asleep.
============================================================================================================================================================================================================================================================ first sleep 1
INFO 11-17 17:06:50 [block_pool.py:378] Successfully reset prefix cache
INFO 11-17 17:06:50 [worker_v1.py:171] Sleep mode freed 0.00 GiB memory, 23.66 GiB memory is still in use.
INFO 11-17 17:06:50 [executor_base.py:189] It took 0.000849 seconds to fall asleep.
============================================================================================================================================================================================================================================================ first sleep 1
INFO 11-17 17:06:50 [block_pool.py:378] Successfully reset prefix cache
INFO 11-17 17:06:50 [worker_v1.py:171] Sleep mode freed 0.00 GiB memory, 23.66 GiB memory is still in use.
INFO 11-17 17:06:50 [executor_base.py:189] It took 0.000957 seconds to fall asleep.
============================================================================================================================================================================================================================================================ first sleep 1
INFO 11-17 17:06:50 [block_pool.py:378] Successfully reset prefix cache
INFO 11-17 17:06:50 [worker_v1.py:171] Sleep mode freed 0.00 GiB memory, 23.66 GiB memory is still in use.
INFO 11-17 17:06:50 [executor_base.py:189] It took 0.000891 seconds to fall asleep.
============================================================================================================================================================================================================================================================ first sleep 1
INFO 11-17 17:06:50 [block_pool.py:378] Successfully reset prefix cache
INFO 11-17 17:06:50 [worker_v1.py:171] Sleep mode freed 0.00 GiB memory, 23.66 GiB memory is still in use.
INFO 11-17 17:06:50 [executor_base.py:189] It took 0.000896 seconds to fall asleep.
============================================================================================================================================================================================================================================================ first sleep 1
INFO 11-17 17:06:50 [block_pool.py:378] Successfully reset prefix cache
INFO 11-17 17:06:50 [worker_v1.py:171] Sleep mode freed 0.00 GiB memory, 23.66 GiB memory is still in use.
INFO 11-17 17:06:50 [executor_base.py:189] It took 0.000873 seconds to fall asleep.
============================================================================================================================================================================================================================================================ first sleep 1
INFO 11-17 17:06:50 [block_pool.py:378] Successfully reset prefix cache
INFO 11-17 17:06:50 [worker_v1.py:171] Sleep mode freed 0.00 GiB memory, 23.66 GiB memory is still in use.
INFO 11-17 17:06:50 [executor_base.py:189] It took 0.000895 seconds to fall asleep.
============================================================================================================================================================================================================================================================ first sleep 1
The script to reproduce:
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export IMAGE_MAX_TOKEN_NUM=1024
export USE_OPTIMIZED_MODEL=0
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export NPROC_PER_NODE=8
swift rlhf \
--rlhf_type grpo \
--model /cache/Qwen3-VL-8B-Instruct/ \
--gradient_checkpointing_kwargs '{"use_reentrant": false}' \
--train_type lora \
--use_vllm true \
--vllm_mode colocate \
--vllm_gpu_memory_utilization 0.8 \
--vllm_max_model_len 4096 \
--vllm_tensor_parallel_size 8 \
--vllm_enable_prefix_caching false \
--offload_optimizer true \
--offload_model true \
--sleep_level 1 \
--dataset lmms-lab/multimodal-open-r1-8k-verified#1000 \
--load_from_cache_file true \
--external_plugins examples/train/grpo/plugin/plugin.py \
--reward_funcs external_r1v_acc format \
--reward_weights 1 0.1 \
--torch_dtype bfloat16 \
--attn_impl flash_attn \
--num_train_epochs 1 \
--lora_rank 64 \
--lora_alpha 128 \
--max_length 4096 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 8 \
--eval_steps 500 \
--save_steps 500 \
--learning_rate 5e-6 \
--save_total_limit 2 \
--logging_steps 1 \
--warmup_ratio 0.0 \
--dataloader_num_workers 4 \
--max_completion_length 512 \
--num_generations 16 \
--steps_per_generation 32 \
--deepspeed zero2 \
--temperature 1.1 \
--top_p 1.0 \
--top_k 80 \
--log_completions false \
--async_generate false \
--system examples/train/grpo/prompt.txt \
--beta 0.001