pyllama
pyllama copied to clipboard
Model does not split for 65B
I have 8 80G A100 GPUs. I can't run correctly for the project, while I can run official example.py.
torchrun --nproc_per_node 8 webapp.py --ckpt_dir /nvme/syx/llama/model/65B/65B/ --tokenizer_path /nvme/syx/ll
ama/model/tokenizer.model
Output:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 344.00 MiB (GPU 0; 79.20 GiB total capacity; 77.97 GiB already allocated; 297.25 MiB free; 77.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
File "/nvme/syx/pyllama/apps/gradio/webapp.py", line 54, in load model = Transformer(model_args) File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/llama/model_single.py", line 200, in init self.layers.append(TransformerBlock(layer_id, params)) File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/llama/model_single.py", line 168, in init self.feed_forward = FeedForward( File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/llama/model_single.py", line 155, in init self.w3 = nn.Linear(dim, hidden_dim, bias=False) File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 96, in init self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
So how to change to model_parallel.py?
I update the init.py, which switch single to parallel.py, but when I send a prompt, it will stuck
File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/gradio/blocks.py", line 1717, in block_thread
File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/gradio/blocks.py", line 1524, in launch
self.share_url = networking.setup_tunnel(
File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/gradio/networking.py", line 168, in setup_tunnel
self.share_url = networking.setup_tunnel(
File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/gradio/networking.py", line 168, in setup_tunnel
self.share_url = networking.setup_tunnel(
File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/gradio/networking.py", line 168, in setup_tunnel
address = tunnel.start_tunnel()
File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/gradio/tunneling.py", line 61, in start_tunnel
address = tunnel.start_tunnel()
File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/gradio/tunneling.py", line 61, in start_tunnel
self.share_url = networking.setup_tunnel(address = tunnel.start_tunnel()
self.share_url = networking.setup_tunnel( File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/gradio/tunneling.py",
line 61, in start_tunnel
self.url = self._start_tunnel(binary_path) File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/gradio/networking.py"
, line 168, in setup_tunnel
File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/gradio/tunneling.py", line 97, in _start_tunnel
self.url = self._start_tunnel(binary_path) File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/gradio/networking.py"
, line 168, in setup_tunnel
File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/gradio/tunneling.py", line 97, in _start_tunnel
self.url = self._start_tunnel(binary_path)
File "/home/songyixin/miniconda3/envs/llama-serve/lib/python3.9/site-packages/gradio/tunneling.py", line 97, in _start_tunnel
line = self.proc.stdout.readline()
line = self.proc.stdout.readline()
That is a really good question. @YixinSong-e
Please check this: https://github.com/juncongmoo/pyllama#%EF%B8%8F-official-way-1
export PYLLAMA_META_MP=1
Thanks for advice. Now I can setup the service. But it will stuck.
File "/home/songyixin/miniconda3/envs/llama/lib/python3.9/site-packages/gradio/networking.py", line 166, in setup_tunnel
File "/home/songyixin/miniconda3/envs/llama/lib/python3.9/site-packages/gradio/tunneling.py", line 95, in _start_tunnel
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 196848 closing signal SIGINT
address = tunnel.start_tunnel()if self.proc.stdout is None:
address = tunnel.start_tunnel() File "/home/songyixin/miniconda3/envs/llama/lib/python3.9/site-packages/gradio/tunneling.py", line 60, in
start_tunnel
File "/home/songyixin/miniconda3/envs/llama/lib/python3.9/site-packages/gradio/networking.py", line 166, in setup_tunnel
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 196849 closing signal SIGINT
File "/home/songyixin/miniconda3/envs/llama/lib/python3.9/site-packages/gradio/tunneling.py", line 60, in start_tunnel
File "/home/songyixin/miniconda3/envs/llama/lib/python3.9/site-packages/gradio/tunneling.py", line 60, in start_tunnel
self.url = self._start_tunnel(binary_path)address = tunnel.start_tunnel()
File "/home/songyixin/miniconda3/envs/llama/lib/python3.9/site-packages/gradio/tunneling.py", line 97, in _start_tunnel
File "/home/songyixin/miniconda3/envs/llama/lib/python3.9/site-packages/gradio/tunneling.py", line 60, in start_tunnel
File "/home/songyixin/miniconda3/envs/llama/lib/python3.9/site-packages/gradio/tunneling.py", line 97, in _start_tunnel
line = self.proc.stdout.readline()
line = self.proc.stdout.readline() self.url = self._start_tunnel(binary_path)
KeyboardInterruptself.url = self._start_tunnel(binary_path) KeyboardInterrupt File "/home/songyixin/miniconda3/envs/llama/lib/python3.9/sit
e-packages/gradio/tunneling.py", line 97, in _start_tunnel
address = tunnel.start_tunnel()
I also encountered this problem. When the program gets here, the owner will be stuck, but no error is reported. i = tokens[:, prev_pos:cur_pos] logits = self.model(i, prev_pos) However, I use 8 v100s32G NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6