swift 量化后的模型推理报错怎么解决

量化后的模型推理报错怎么解决

Open greatheart1000 opened this issue 1 month ago • 1 comments

aqq Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图) 量化后的模型推理报错 CUDA_VISIBLE_DEVICES=0 swift infer --model_type baichuan2-7b --model_id_or_path baichuan2-7b-gptq-int4

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等) File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/quantization/gptq.py", line 208, in apply_weights output = ops.gptq_gemm(reshaped_x, weights["qweight"], RuntimeError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacty of 22.19 GiB of which 14.50 MiB is free. Process 832 has 1.31 GiB memory in use. Process 3711 has 1.31 GiB memory in use. Including non-PyTorch memory, this process has 19.55 GiB memory in use. Of the allocated memory 17.90 GiB is allocated by PyTorch, and 155.89 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Exception raised from malloc at ../c10/cuda/CUDACachingAllocator.cpp:1438 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f1f7070f617 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) frame #1: + 0x30f6c (0x7f1f707adf6c in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so) frame #2: + 0x3139e (0x7f1f707ae39e in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so) frame #3: + 0x3175e (0x7f1f707ae75e in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so) frame #4: + 0x16c1461 (0x7f1f2eaf7461 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so) frame #5: at::detail::empty_generic(c10::ArrayRef, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optionalc10::MemoryFormat) + 0x14 (0x7f1f2eaef674 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so) frame #6: at::detail::empty_cuda(c10::ArrayRef, c10::ScalarType, c10::optionalc10::Device, c10::optionalc10::MemoryFormat) + 0x111 (0x7f1f029a4061 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) frame #7: at::detail::empty_cuda(c10::ArrayRef, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x31 (0x7f1f029a4331 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) frame #8: at::native::empty_cuda(c10::ArrayRef, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x20 (0x7f1f02ad13c0 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) frame #9: + 0x2d403a9 (0x7f1f048bc3a9 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) frame #10: + 0x2d4048b (0x7f1f048bc48b in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) frame #11: at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRefc10::SymInt, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0xe7 (0x7f1f2fa1d277 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so) frame #12: + 0x295eaef (0x7f1f2fd94aef in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so) frame #13: at::_ops::empty_memory_format::call(c10::ArrayRefc10::SymInt, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x1a3 (0x7f1f2fa613e3 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so) frame #14: torch::empty(c10::ArrayRef, c10::TensorOptions, c10::optionalc10::MemoryFormat) + 0x23d (0x7f1e2dc1ce0d in /usr/local/lib/python3.10/dist-packages/vllm/_C.cpython-310-x86_64-linux-gnu.so) frame #15: gptq_gemm(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, bool, int) + 0x2dd (0x7f1e2dc18ffd in /usr/local/lib/python3.10/dist-packages/vllm/_C.cpython-310-x86_64-linux-gnu.so) frame #16: + 0x94f62 (0x7f1e2dc31f62 in /usr/local/lib/python3.10/dist-packages/vllm/_C.cpython-310-x86_64-linux-gnu.so) frame #17: + 0x90dac (0x7f1e2dc2ddac in /usr/local/lib/python3.10/dist-packages/vllm/_C.cpython-310-x86_64-linux-gnu.so)

Additional context Add any other context about the problem here(在这里补充其他信息)

May 15 '24 10:05 greatheart1000

swift swift copied to clipboard

量化后的模型推理报错怎么解决

swift
swift copied to clipboard