WeClone 单卡3090数据预处理: No available memory for the cache blocks.

在数据预处理的时候运行weclone-cli make-dataset, 就会报下面的错误

INFO 05-16 18:43:01 [loader.py:447] Loading weights took 2.60 seconds
INFO 05-16 18:43:01 [gpu_model_runner.py:1186] Model loading took 14.2487 GB and 2.777985 seconds
INFO 05-16 18:43:07 [backends.py:415] Using cache directory: /home/118/.cache/vllm/torch_compile_cache/834147faf0/rank_0_0 for vLLM's torch.compile
INFO 05-16 18:43:07 [backends.py:425] Dynamo bytecode transform time: 5.80 s
INFO 05-16 18:43:07 [backends.py:115] Directly load the compiled graph for shape None from the cache
INFO 05-16 18:43:12 [monitor.py:33] torch.compile takes 5.80 s in total
ERROR 05-16 18:43:14 [core.py:343] EngineCore hit an exception: Traceback (most recent call last):
ERROR 05-16 18:43:14 [core.py:343]   File "/home/118/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 335, in run_engine_core
ERROR 05-16 18:43:14 [core.py:343]     engine_core = EngineCoreProc(*args, **kwargs)
ERROR 05-16 18:43:14 [core.py:343]   File "/home/118/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 290, in __init__
ERROR 05-16 18:43:14 [core.py:343]     super().__init__(vllm_config, executor_class, log_stats)
ERROR 05-16 18:43:14 [core.py:343]   File "/home/118/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 63, in __init__
ERROR 05-16 18:43:14 [core.py:343]     num_gpu_blocks, num_cpu_blocks = self._initialize_kv_caches(
ERROR 05-16 18:43:14 [core.py:343]   File "/home/118/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 126, in _initialize_kv_caches
ERROR 05-16 18:43:14 [core.py:343]     kv_cache_configs = [
ERROR 05-16 18:43:14 [core.py:343]   File "/home/118/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 127, in <listcomp>
ERROR 05-16 18:43:14 [core.py:343]     get_kv_cache_config(vllm_config, kv_cache_spec_one_worker,
ERROR 05-16 18:43:14 [core.py:343]   File "/home/118/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/core/kv_cache_utils.py", line 604, in get_kv_cache_config
ERROR 05-16 18:43:14 [core.py:343]     check_enough_kv_cache_memory(vllm_config, kv_cache_spec, available_memory)
ERROR 05-16 18:43:14 [core.py:343]   File "/home/118/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/core/kv_cache_utils.py", line 468, in check_enough_kv_cache_memory
ERROR 05-16 18:43:14 [core.py:343]     raise ValueError("No available memory for the cache blocks. "
ERROR 05-16 18:43:14 [core.py:343] ValueError: No available memory for the cache blocks. Try increasing `gpu_memory_utilization` when initializing the engine.
ERROR 05-16 18:43:14 [core.py:343] 
CRITICAL 05-16 18:43:14 [core_client.py:269] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20             Driver Version: 570.133.20     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:31:00.0  On |                  N/A |
| 36%   43C    P8             30W /  350W |    1317MiB /  24576MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1926      G   /usr/lib/xorg/Xorg                      574MiB |
|    0   N/A  N/A            2875    C+G   ...c/gnome-remote-desktop-daemon        258MiB |
|    0   N/A  N/A            2925      G   /usr/bin/gnome-shell                    150MiB |
|    0   N/A  N/A           53165      G   /opt/google/chrome/chrome                 4MiB |
|    0   N/A  N/A           53215      G   ...ersion=20250515-180047.882000        257MiB |
+-----------------------------------------------------------------------------------------+

May 16 '25 10:05 bbbugg

试试在weclone/core/inference/vllm_infer.py 的 engine_args 里加一条 "gpu_memory_utilization": 0.95,

May 16 '25 11:05 binghan1227

试试在weclone/core/inference/vllm_infer.py 的 engine_args 里加一条 "gpu_memory_utilization": 0.95,

老哥厉害了, 之前试的"gpu_memory_utilization": 0.9都不行, "gpu_memory_utilization": 0.95就可以了.

May 16 '25 11:05 bbbugg

ERROR 05-16 23:47:16 [core.py:343] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.62 GiB. GPU 0 has a total capacity of 39.38 GiB of which 987.38 MiB is free. Process 999655 has 38.41 GiB memory in use. Of the allocated memory 35.38 GiB is allocated by PyTorch, with 97.50 MiB allocated in private pools (e.g., CUDA Graphs), and 2.18 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) ERROR 05-16 23:47:16 [core.py:343] CRITICAL 05-16 23:47:16 [core_client.py:269] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue. Killed我也是类似的改了参数也不行

May 16 '25 15:05 AHUA-Official

存在同样问题，添加了参数后还是会出现这个问题 ERROR 05-17 03:31:17 [core.py:343] raise ValueError("No available memory for the cache blocks. " ERROR 05-17 03:31:17 [core.py:343] ValueError: No available memory for the cache blocks. Try increasing gpu_memory_utilization when initializing the engine. ERROR 05-17 03:31:17 [core.py:343] CRITICAL 05-17 03:31:17 [core_client.py:269] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.

May 16 '25 19:05 fkomorebi

存在同样问题，添加了参数后还是会出现这个问题 ERROR 05-17 03:31:17 [core.py:343] raise ValueError("No available memory for the cache blocks. " ERROR 05-17 03:31:17 [core.py:343] ValueError: No available memory for the cache blocks. Try increasing gpu_memory_utilization when initializing the engine. ERROR 05-17 03:31:17 [core.py:343] CRITICAL 05-17 03:31:17 [core_client.py:269] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.

感觉这个是4060Ti显卡的显存不够

May 17 '25 06:05 bbbugg

请问这个步骤大概需要多少显存呢？

May 17 '25 06:05 fkomorebi

请问这个步骤大概需要多少显存呢？

我跑的时候看了一下，显存跑满了，这个线程用了22G多的显存

May 17 '25 06:05 bbbugg

还有办法解决吗（捂脸）

May 17 '25 12:05 kongtou20070406

我最后是没有开启敏感信息过滤。尝试过：

在readme的最开始有一个在windows中部署的文章。里面提到如果显存小可以开启一个设置。但是在执行时会中断，没有处理结果的。不知道为啥。

May 17 '25 18:05 fkomorebi