CUDA out of memory.

Open TreasureHunter opened this issue 7 months ago • 0 comments

非常出色的工作！您在文中4.1.4章节提到实验GPU配置是一张RTX 4090 24GB，我使用llama3.1测试longbench，在最开始的narrativeqa任务就出现OOM问题。您提供的run_llama.sh文件中device设置为0和1，我只设置了device为0，在单张4090进行实验，其他设置均和sh文件保持一致。

Traceback (most recent call last):
  File "/root/PQCache/vq_pred.py", line 463, in <module>
    get_pred(args, model, tokenizer, 0, world_size, data_all, max_length, max_gen,
  File "/root/PQCache/vq_pred.py", line 178, in get_pred
    output = model.generate(
  File "/root/miniconda3/envs/pqcache/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/pqcache/lib/python3.10/site-packages/transformers/generation/utils.py", line 1989, in generate
    result = self._sample(
  File "/root/miniconda3/envs/pqcache/lib/python3.10/site-packages/transformers/generation/utils.py", line 2932, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/root/miniconda3/envs/pqcache/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/pqcache/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/PQCache/vq_method/llama31_patch.py", line 461, in forward
    outputs = self.model(
  File "/root/miniconda3/envs/pqcache/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/pqcache/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/PQCache/vq_method/llama31_patch.py", line 312, in forward
    layer_outputs = decoder_layer(
  File "/root/miniconda3/envs/pqcache/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/pqcache/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/pqcache/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 693, in forward
    hidden_states = self.mlp(hidden_states)
  File "/root/miniconda3/envs/pqcache/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/pqcache/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/pqcache/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 253, in forward
    down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.25 GiB. GPU 0 has a total capacty of 23.53 GiB of which 1.21 GiB is free. Including non-PyTorch memory, this process has 21.90 GiB memory in use. Process 34887 has 384.00 MiB memory in use. Of the allocated memory 21.35 GiB is allocated by PyTorch, and 76.91 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

期待您的答复，不胜感激！

May 28 '25 07:05 TreasureHunter