pyllama already quantize to 4bit and get the model pyllama-7B4b.pt，but can not run in RTX3080. report torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 10.00 GiB total capacity; 9.24 GiB already allocated;

already quantize to 4bit and get the model pyllama-7B4b.pt，but can not run in RTX3080. report torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 10.00 GiB total capacity; 9.24 GiB already allocated;

Open elven2016 opened this issue 1 year ago • 2 comments

the error is as the follow： python webapp_single.py --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH Traceback (most recent call last): File "/home/xxxx/chatllama/pyllama/apps/gradio/webapp_single.py", line 80, in generator = load( File "/home/u/chatllama/pyllama/apps/gradio/webapp_single.py", line 42, in load model = Transformer(model_args) File "/home/xxxx/miniconda3/envs/chatllama/lib/python3.10/site-packages/llama/model_single.py", line 199, in init self.layers.append(TransformerBlock(layer_id, params)) File "/home/xxxx/miniconda3/envs/chatllama/lib/python3.10/site-packages/llama/model_single.py", line 167, in init self.feed_forward = FeedForward( File "/home/xxxx/miniconda3/envs/chatllama/lib/python3.10/site-packages/llama/model_single.py", line 154, in init self.w3 = nn.Linear(dim, hidden_dim, bias=False) File "/home/xxxx/miniconda3/envs/chatllama/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 96, in init self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs)) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 10.00 GiB total capacity; 9.24 GiB already allocated; 0 bytes free; 9.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

du -sh pyllama-7B4b.pt 3.6G pyllama-7B4b.pt

Mar 29 '23 01:03 elven2016

pyllama pyllama copied to clipboard

already quantize to 4bit and get the model pyllama-7B4b.pt，but can not run in RTX3080. report torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 10.00 GiB total capacity; 9.24 GiB already allocated;

pyllama
pyllama copied to clipboard