kvcached available_size()=9.0 < need

When I tried to process a small sample size, around 40k samples it works just fine. But in real run a dataset would have more than 200k samples at minimum and it is failing because of available cache. What would you recommend to come around the issue or fix it? @jiarong0907 @ivanium

Adding requests: 100%|██████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:09<00:00, 55.78it/s]
Adding requests:  88%|████████████████████████████████████████████████████████████████████████████          | 453/512 [00:06<00:00, 79.11it/s](EngineCore_DP0 pid=10311) [kvcached][WARNING][2025-10-25 11:17:27][kv_cache_manager.py:164] available_size()=9.0 < need_size=30 10.80 toks/s]
Processed prompts:   6%|█▋                            | 29/512 [00:08<02:31,  3.19it/s, est. speed input: 1680.77 toks/s, output: 7.16 toks/s](EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720] EngineCore encountered a fatal error.
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720] Traceback (most recent call last):
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 711, in run_engine_core
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]     engine_core.run_busy_loop()
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 738, in run_busy_loop
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]     self._process_engine_step()
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 764, in _process_engine_step
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]     outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]                               ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 291, in step
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]     scheduler_output = self.scheduler.schedule()
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]                        ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/core/sched/scheduler.py", line 474, in schedule
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]     new_blocks = self.kv_cache_manager.allocate_slots(
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/core/kv_cache_manager.py", line 287, in allocate_slots
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]     new_blocks = self.coordinator.allocate_new_blocks(
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/core/kv_cache_coordinator.py", line 112, in allocate_new_blocks
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]     return tuple(
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]            ^^^^^^
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/core/kv_cache_coordinator.py", line 113, in <genexpr>
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]     manager.allocate_new_blocks(
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/core/single_type_kv_cache_manager.py", line 129, in allocate_new_blocks
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]     new_blocks = self.block_pool.get_new_blocks(num_new_blocks)
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]   File "/opt/venv/lib/python3.12/site-packages/kvcached/integration/vllm/patches.py", line 90, in get_new_blocks
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]     assert block_ids is not None and len(block_ids) == num_blocks
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=10311) ERROR 10-25 11:17:27 [core.py:720] AssertionError
Error in run_dataloader: EngineCore encountered an issue. See stack trace (above) for the root cause.

Oct 25 '25 18:10 alecngo

Thanks for the issue! This seems like kvcached thought there is available memory at first but found memory becomes insufficient in actual allocation. To help debug, could you let us know under what settings you encountered this issue? For example, how many instances you are running, how high the gpu memory utilization is set for each instance, etc.

Oct 25 '25 20:10 ivanium

Hi @ivanium , I was using 3 instances of qwen2 7B in a A100_80GB. I did not set the gpu_memory_utilization so by default it should be 90%. I thought kvcached would handle the kv memory for me but seems like it is not?

Oct 25 '25 22:10 alecngo

I am correcting myself, looking at the engine setup I set the gpu_memory_utilization to 0.5 and it failed. Increasing to 0.8 helped. I think we can call it close unless the team wants it to fail earlier at the first iteration.

Oct 26 '25 02:10 alecngo

I am correcting myself, looking at the engine setup I set the gpu_memory_utilization to 0.5 and it failed. Increasing to 0.8 helped. I think we can call it close unless the team wants it to fail earlier at the first iteration.

Got it! Thanks for debugging this.

When kvcached is enabled, we suggest to not set any gpu memory utilization. kvcached will try to utilize as much as gpu memory as possible for memory efficiency.

Oct 26 '25 02:10 jiarong0907

Seems like we need to revisit this at some point. I did not set gpu memory utilization but it still is a miss-or-hit. There are other variables that can factor it, including batch_size, max_num_seqs. Sometimes we do not see engine failure until some rounds of iteration completed and it would be nice to have early failure for appropriate engine setup.

Oct 27 '25 19:10 alecngo

Replied in #197. I agree they are the same issue and I still highly suspect this is a race condition. Will follow up there.

Oct 27 '25 23:10 ivanium