FastChat
FastChat copied to clipboard
execute fastchat.serve.cli error
execute command: python -m fastchat.serve.cli --model-path ./model_weights/lmsys/vicuna-7b-delta-v1.1 --load-8bit
error content:
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.44s/it]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in _run_module_as_main:198 │
│ in run_code:88 │
│ │
│ G:\FastChat\fastchat\serve\cli.py:132 in
just a helpful hint. if you paste in an image instead of text it makes it hard for people with this bug to find the solution if it ever gets posted here.
anyways it looks like you're on windows. so I hope you have an NVIDIA GPU and if so make sure you have the cuda version of pytorch installed and all the cuda stuff
https://pub.towardsai.net/installing-pytorch-with-cuda-support-on-windows-10-a38b1134535e
if you do it could be a bug
just a helpful hint. if you paste in an image instead of text it makes it hard for people with this bug to find the solution if it ever gets posted here.
anyways it looks like you're on windows. so I hope you have an NVIDIA GPU and if so make sure you have the cuda version of pytorch installed and all the cuda stuff
https://pub.towardsai.net/installing-pytorch-with-cuda-support-on-windows-10-a38b1134535e
if you do it could be a bug
I am currently running python 3- m fastchat. serve. cli -- model path/path/to/vicuna/weights -- device CPU can run, but it takes up 100% CPU and 60% memory;
during execution python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --load-8bit, the result reported an error Torch not compiled with CPDA enabled; I am not very clear about your answer. Can you elaborate on how to do it;
thank you!
command: python -m fastchat.serve.cli --model-path ./model_weights/lmsys/vicuna-7b-delta-v1.1 --load-8bit
error info: OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 6.00 GiB total capacity; 4.01 GiB already allocated; 84.00 MiB free; 4.55 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
command: python -m fastchat.serve.cli --model-path ./model_weights/lmsys/vicuna-7b-delta-v1.1 --load-8bit
error info: OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 6.00 GiB total capacity; 4.01 GiB already allocated; 84.00 MiB free; 4.55 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I met the same error,have you fixed it?
you are running out of memory.
I have same problem and CUDA 12 installed. Should I use CUDA 11 ? My GPU is RTX 2080S
try a smaller model or use a better GPU!
Not necessarily a memory issue. I had the same on a Nvidia RTX Quadro 6000 (24 GB VRAM). Lowering precision to 8 bit enables one to run 13B models from GPU. Anyway, the issue was CUDA. Somebody posted his working setup and i noticed he was on CUDA Version 11 and i was on 12.2 and it just wouldn't work till i downgraded to 11.7