Majidur Rahman comments

Results 5 comments of


                                            Majidur Rahman

MiniLLM LLaMA-Torch Shape Mismatch for Model Parallel

This is the script that I used: #! /bin/bash MASTER_ADDR=localhost MASTER_PORT=${2-2012} NNODES=1 NODE_RANK=0 GPUS_PER_NODE=${3-16} DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE \ --nnodes $NNODES \ --node_rank $NODE_RANK \ --master_addr $MASTER_ADDR \ --master_port $MASTER_PORT" # model...

MiniLLM LLaMA-Torch Shape Mismatch for Model Parallel

Yes. I tried running on a clean environment. But still getting memory errors like the following: "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 14.00 MiB. GPU"

MiniLLM LLaMA-Torch Shape Mismatch for Model Parallel

I tried using MP_SIZE = 4, but I'm still getting the "out of memory" error. Do you have any suggestions in this regard?

MiniLLM LLaMA-Torch Shape Mismatch for Model Parallel

Interestingly, I successfully ran the script for "LLaMA2-7B" with MP_SIZE = 4.

Minillm: can you share a detail operation to reproduce the result?

I am checking to see if I am doing this correctly. 1. I have downloaded the LLaMA2-13B from HuggingFace (https://huggingface.co/meta-llama/Llama-2-13b-hf). 2. I have generated a weight configuration file named "llama2.json,"...