inference ---- out of memery
请问推理代码并行化demo是哪个?根据 readme这么调的,但是不行一直报错。。 CUDA_VISIBLE_DEVICES=4,5,6,7 python cli_demo_sat.py --from_pretrained /data/CogCoM/CogCoM/cogcom-chat-17b --local_tokenizer /data/CogCoM/CogCoM-main/vicuna-7b-v1.5 --fp16 --quant 8 --english --nproc_per_node 4
(服务器是8卡v100-32G)
报错:
kenizer /data/CogCoM/CogCoM-main/vicuna-7b-v1.5 --fp16 --quant 8 --english --nproc_per_node 4
[2024-07-08 18:19:08,531] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-08 18:19:10,471] [WARNING] Failed to load bitsandbytes:No module named 'bitsandbytes'
[2024-07-08 18:19:12,346] [INFO] building CogCoMModel model ...
[2024-07-08 18:19:12,348] [INFO] [RANK 0] > initializing model parallel with size 1
[2024-07-08 18:19:12,349] [INFO] [RANK 0] You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE=1. If this is wrong, please pass the LOCAL_WORLD_SIZE manually.
[2024-07-08 18:19:12,349] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
[2024-07-08 18:19:27,219] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 17639685376
[2024-07-08 18:19:45,973] [INFO] [RANK 0] CUDA out of memory. Tried to allocate 86.00 MiB. GPU
[2024-07-08 18:19:45,974] [INFO] [RANK 0] global rank 0 is loading checkpoint /data/CogCoM/CogCoM/cogcom-chat-17b/50000/mp_rank_00_model_states.pt
[2024-07-08 18:20:19,633] [INFO] [RANK 0] > successfully loaded /data/CogCoM/CogCoM/cogcom-chat-17b/50000/mp_rank_00_model_states.pt
[2024-07-08 18:20:21,178] [INFO] [RANK 0] > Quantizing model weight to 8 bits
[rank0]: Traceback (most recent call last):
[rank0]: File "/data/CogCoM/CogCoM-main/cogcom/demo/cli_demo_sat_zd.py", line 167, in
可以通过如下方式调用torchrun --standalone --nnodes=1 --nproc-per-node=4 cogcom/demo/cli_demo_sat.py --from_pretrained /data/CogCoM/CogCoM/cogcom-chat-17b --local_tokenizer /data/CogCoM/CogCoM-main/vicuna-7b-v1.5 --fp16 --quant 8,是否量化按需添加 @AugWrite
哇谢谢!解决了!!(^ ▽ ^)