vllm BUG: swap_size - when distributed serving very large LMs

BUG: swap_size - when distributed serving very large LMs

Open MM-IR opened this issue 1 year ago • 4 comments

Hi I run into another issue: "RuntimeError: Aborted due to the lack of CPU swap space. Please increase the swap space to avoid this error."

What is this problem?

Jul 26 '23 13:07 MM-IR

I suddenly notice that you have some updates with that file (vllm/core/scheduler.py ~ line 384~395).

if preemption_mode is None: seqs = seq_group.get_seqs(status=SequenceStatus.RUNNING) if len(seqs) == 1: preemption_mode = PreemptionMode.RECOMPUTE else: preemption_mode = PreemptionMode.SWAP if preemption_mode == PreemptionMode.RECOMPUTE: self._preempt_by_recompute(seq_group) elif preemption_mode == PreemptionMode.SWAP: self._preempt_by_swap(seq_group, blocks_to_swap_out) else: assert False, "Invalid preemption mode."

Can I ask for more clarifications on this? Thanks so much in advance.

Jul 27 '23 09:07 MM-IR

Is this like my process overflows my CPU memory?

I am running with 8 GPUs (A5000), but 4 of them are taken for other large-scale inferences. So I am considering running another one on the remaining 4 for the next inference...

Thanks very much in advance.

Jul 27 '23 09:07 MM-IR

BTW, since I need to control the detailed CUDA_VISIBLE_DEVICES when running your VLLM API, how do I add another specified GPU (e.g., cuda:6) for auxiliary models in my programs, assuming another main large model with the VISIBLE_DEVICES (os.env?).

Jul 27 '23 09:07 MM-IR

At present, we have found a workaround and set the swap space directly to 0. This way, we will not call the CPU swap space and will not report any errors. However, the CPU blocks will also become 0, which may slow down the speed a bit, but at least it will not hang and die.

Jan 04 '24 07:01 chi2liu

vllm vllm copied to clipboard

BUG: swap_size - when distributed serving very large LMs

vllm
vllm copied to clipboard