单卡3090数据预处理: No available memory for the cache blocks.
在数据预处理的时候运行weclone-cli make-dataset, 就会报下面的错误
INFO 05-16 18:43:01 [loader.py:447] Loading weights took 2.60 seconds
INFO 05-16 18:43:01 [gpu_model_runner.py:1186] Model loading took 14.2487 GB and 2.777985 seconds
INFO 05-16 18:43:07 [backends.py:415] Using cache directory: /home/118/.cache/vllm/torch_compile_cache/834147faf0/rank_0_0 for vLLM's torch.compile
INFO 05-16 18:43:07 [backends.py:425] Dynamo bytecode transform time: 5.80 s
INFO 05-16 18:43:07 [backends.py:115] Directly load the compiled graph for shape None from the cache
INFO 05-16 18:43:12 [monitor.py:33] torch.compile takes 5.80 s in total
ERROR 05-16 18:43:14 [core.py:343] EngineCore hit an exception: Traceback (most recent call last):
ERROR 05-16 18:43:14 [core.py:343] File "/home/118/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 335, in run_engine_core
ERROR 05-16 18:43:14 [core.py:343] engine_core = EngineCoreProc(*args, **kwargs)
ERROR 05-16 18:43:14 [core.py:343] File "/home/118/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 290, in __init__
ERROR 05-16 18:43:14 [core.py:343] super().__init__(vllm_config, executor_class, log_stats)
ERROR 05-16 18:43:14 [core.py:343] File "/home/118/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 63, in __init__
ERROR 05-16 18:43:14 [core.py:343] num_gpu_blocks, num_cpu_blocks = self._initialize_kv_caches(
ERROR 05-16 18:43:14 [core.py:343] File "/home/118/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 126, in _initialize_kv_caches
ERROR 05-16 18:43:14 [core.py:343] kv_cache_configs = [
ERROR 05-16 18:43:14 [core.py:343] File "/home/118/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 127, in <listcomp>
ERROR 05-16 18:43:14 [core.py:343] get_kv_cache_config(vllm_config, kv_cache_spec_one_worker,
ERROR 05-16 18:43:14 [core.py:343] File "/home/118/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/core/kv_cache_utils.py", line 604, in get_kv_cache_config
ERROR 05-16 18:43:14 [core.py:343] check_enough_kv_cache_memory(vllm_config, kv_cache_spec, available_memory)
ERROR 05-16 18:43:14 [core.py:343] File "/home/118/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/core/kv_cache_utils.py", line 468, in check_enough_kv_cache_memory
ERROR 05-16 18:43:14 [core.py:343] raise ValueError("No available memory for the cache blocks. "
ERROR 05-16 18:43:14 [core.py:343] ValueError: No available memory for the cache blocks. Try increasing `gpu_memory_utilization` when initializing the engine.
ERROR 05-16 18:43:14 [core.py:343]
CRITICAL 05-16 18:43:14 [core_client.py:269] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20 Driver Version: 570.133.20 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:31:00.0 On | N/A |
| 36% 43C P8 30W / 350W | 1317MiB / 24576MiB | 2% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1926 G /usr/lib/xorg/Xorg 574MiB |
| 0 N/A N/A 2875 C+G ...c/gnome-remote-desktop-daemon 258MiB |
| 0 N/A N/A 2925 G /usr/bin/gnome-shell 150MiB |
| 0 N/A N/A 53165 G /opt/google/chrome/chrome 4MiB |
| 0 N/A N/A 53215 G ...ersion=20250515-180047.882000 257MiB |
+-----------------------------------------------------------------------------------------+
试试在weclone/core/inference/vllm_infer.py 的 engine_args 里加一条 "gpu_memory_utilization": 0.95,
试试在
weclone/core/inference/vllm_infer.py的engine_args里加一条"gpu_memory_utilization": 0.95,
老哥厉害了, 之前试的"gpu_memory_utilization": 0.9都不行, "gpu_memory_utilization": 0.95就可以了.
ERROR 05-16 23:47:16 [core.py:343] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.62 GiB. GPU 0 has a total capacity of 39.38 GiB of which 987.38 MiB is free. Process 999655 has 38.41 GiB memory in use. Of the allocated memory 35.38 GiB is allocated by PyTorch, with 97.50 MiB allocated in private pools (e.g., CUDA Graphs), and 2.18 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
ERROR 05-16 23:47:16 [core.py:343]
CRITICAL 05-16 23:47:16 [core_client.py:269] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
Killed我也是类似的 改了参数也不行
存在同样问题,添加了参数后还是会出现这个问题
ERROR 05-17 03:31:17 [core.py:343] raise ValueError("No available memory for the cache blocks. "
ERROR 05-17 03:31:17 [core.py:343] ValueError: No available memory for the cache blocks. Try increasing gpu_memory_utilization when initializing the engine.
ERROR 05-17 03:31:17 [core.py:343]
CRITICAL 05-17 03:31:17 [core_client.py:269] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
存在同样问题,添加了参数后还是会出现这个问题 ERROR 05-17 03:31:17 [core.py:343] raise ValueError("No available memory for the cache blocks. " ERROR 05-17 03:31:17 [core.py:343] ValueError: No available memory for the cache blocks. Try increasing
gpu_memory_utilizationwhen initializing the engine. ERROR 05-17 03:31:17 [core.py:343] CRITICAL 05-17 03:31:17 [core_client.py:269] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
感觉这个是4060Ti显卡的显存不够
请问这个步骤大概需要多少显存呢?
请问这个步骤大概需要多少显存呢?
我跑的时候看了一下,显存跑满了,这个线程用了22G多的显存
还有办法解决吗(捂脸)
我最后是没有开启敏感信息过滤。 尝试过:
- 在readme的最开始有一个在windows中部署的文章。里面提到如果显存小可以开启一个设置。但是在执行时会中断,没有处理结果的。不知道为啥。