FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

execute fastchat.serve.cli error

Open ch930410 opened this issue 1 year ago • 5 comments

execute command: python -m fastchat.serve.cli --model-path ./model_weights/lmsys/vicuna-7b-delta-v1.1 --load-8bit

error content: Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.44s/it] ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ in _run_module_as_main:198 │ │ in run_code:88 │ │ │ │ G:\FastChat\fastchat\serve\cli.py:132 in │ │ │ │ 129 │ │ │ │ │ │ choices=["simple", "rich"], help="Display style.") │ │ 130 │ parser.add_argument("--debug", action="store_true") │ │ 131 │ args = parser.parse_args() │ │ ❱ 132 │ main(args) │ │ 133 │ │ │ │ G:\FastChat\fastchat\serve\cli.py:108 in main │ │ │ │ 105 │ else: │ │ 106 │ │ raise ValueError(f"Invalid style for console: {args.style}") │ │ 107 │ try: │ │ ❱ 108 │ │ chat_loop(args.model_path, args.device, args.num_gpus, args.max_gpu_memory, │ │ 109 │ │ │ args.load_8bit, args.conv_template, args.temperature, args.max_new_tokens, │ │ 110 │ │ │ chatio, args.debug) │ │ 111 │ except KeyboardInterrupt: │ │ │ │ G:\FastChat\fastchat\serve\inference.py:182 in chat_loop │ │ │ │ 179 │ │ │ max_new_tokens: int, chatio: ChatIO, │ │ 180 │ │ │ debug: bool): │ │ 181 │ # Model │ │ ❱ 182 │ model, tokenizer = load_model(model_path, device, │ │ 183 │ │ num_gpus, max_gpu_memory, load_8bit, debug) │ │ 184 │ is_chatglm = "chatglm" in str(type(model)).lower() │ │ 185 │ │ │ │ G:\FastChat\fastchat\serve\inference.py:87 in load_model │ │ │ │ 84 │ │ raise_warning_for_old_weights(model_path, model) │ │ 85 │ │ │ 86 │ if load_8bit: │ │ ❱ 87 │ │ compress_module(model, device) │ │ 88 │ │ │ 89 │ if (device == "cuda" and num_gpus == 1) or device == "mps": │ │ 90 │ │ model.to(device) │ │ │ │ G:\FastChat\fastchat\serve\compression.py:42 in compress_module │ │ │ │ 39 │ │ target_attr = getattr(module, attr_str) │ │ 40 │ │ if type(target_attr) == torch.nn.Linear: │ │ 41 │ │ │ setattr(module, attr_str, │ │ ❱ 42 │ │ │ │ CLinear(target_attr.weight, target_attr.bias, target_device)) │ │ 43 │ for name, child in module.named_children(): │ │ 44 │ │ compress_module(child, target_device) │ │ 45 │ │ │ │ G:\FastChat\fastchat\serve\compression.py:29 in init │ │ │ │ 26 │ def init(self, weight, bias, device): │ │ 27 │ │ super().init() │ │ 28 │ │ │ │ ❱ 29 │ │ self.weight = compress(weight.data.to(device), default_compression_config) │ │ 30 │ │ self.bias = bias │ │ 31 │ │ │ 32 │ def forward(self, input: Tensor) -> Tensor: │ │ │ │ D:\python\Lib\site-packages\torch\cuda_init.py:239 in _lazy_init │ │ │ │ 236 │ │ │ │ "Cannot re-initialize CUDA in forked subprocess. To use CUDA with " │ │ 237 │ │ │ │ "multiprocessing, you must use the 'spawn' start method") │ │ 238 │ │ if not hasattr(torch._C, '_cuda_getDeviceCount'): │ │ ❱ 239 │ │ │ raise AssertionError("Torch not compiled with CUDA enabled") │ │ 240 │ │ if _cudart is None: │ │ 241 │ │ │ raise AssertionError( │ │ 242 │ │ │ │ "libcudart functions unavailable. It looks like you have a broken build? │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ AssertionError: Torch not compiled with CUDA enabled

image

image

ch930410 avatar Apr 20 '23 09:04 ch930410

just a helpful hint. if you paste in an image instead of text it makes it hard for people with this bug to find the solution if it ever gets posted here.

anyways it looks like you're on windows. so I hope you have an NVIDIA GPU and if so make sure you have the cuda version of pytorch installed and all the cuda stuff

https://pub.towardsai.net/installing-pytorch-with-cuda-support-on-windows-10-a38b1134535e

if you do it could be a bug

sirus20x6 avatar Apr 20 '23 20:04 sirus20x6

just a helpful hint. if you paste in an image instead of text it makes it hard for people with this bug to find the solution if it ever gets posted here.

anyways it looks like you're on windows. so I hope you have an NVIDIA GPU and if so make sure you have the cuda version of pytorch installed and all the cuda stuff

https://pub.towardsai.net/installing-pytorch-with-cuda-support-on-windows-10-a38b1134535e

if you do it could be a bug

I am currently running python 3- m fastchat. serve. cli -- model path/path/to/vicuna/weights -- device CPU can run, but it takes up 100% CPU and 60% memory;

during execution python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --load-8bit, the result reported an error Torch not compiled with CPDA enabled; I am not very clear about your answer. Can you elaborate on how to do it;

thank you!

ch930410 avatar Apr 21 '23 01:04 ch930410

command: python -m fastchat.serve.cli --model-path ./model_weights/lmsys/vicuna-7b-delta-v1.1 --load-8bit

error info: OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 6.00 GiB total capacity; 4.01 GiB already allocated; 84.00 MiB free; 4.55 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ch930410 avatar Apr 21 '23 03:04 ch930410

command: python -m fastchat.serve.cli --model-path ./model_weights/lmsys/vicuna-7b-delta-v1.1 --load-8bit

error info: OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 6.00 GiB total capacity; 4.01 GiB already allocated; 84.00 MiB free; 4.55 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I met the same error,have you fixed it?

chenglimin avatar May 05 '23 13:05 chenglimin

you are running out of memory.

zhisbug avatar May 08 '23 09:05 zhisbug

I have same problem and CUDA 12 installed. Should I use CUDA 11 ? My GPU is RTX 2080S

ffreality avatar May 09 '23 02:05 ffreality

try a smaller model or use a better GPU!

merrymercy avatar May 20 '23 14:05 merrymercy

Not necessarily a memory issue. I had the same on a Nvidia RTX Quadro 6000 (24 GB VRAM). Lowering precision to 8 bit enables one to run 13B models from GPU. Anyway, the issue was CUDA. Somebody posted his working setup and i noticed he was on CUDA Version 11 and i was on 12.2 and it just wouldn't work till i downgraded to 11.7

Nemesis-the-Warlock avatar Jul 07 '23 17:07 Nemesis-the-Warlock