BELLE icon indicating copy to clipboard operation
BELLE copied to clipboard

请问模型运行时的内存和显存需要多少?

Open MingJiaAn opened this issue 1 year ago • 6 comments

我这边60G的内存在AutoModelForCausalLM.from_pretrained过程溢出了

MingJiaAn avatar Mar 22 '23 09:03 MingJiaAn

建议GPU运行。如果不做量化的话,加载7B模型大概需要28G显存左右。

bestpredicts avatar Mar 22 '23 10:03 bestpredicts

议GPU运行。如果不做量化的话,加载7B模型大概需要28

是在gpu运行的,但是加载模型的时候,内存先暴涨

MingJiaAn avatar Mar 22 '23 10:03 MingJiaAn

colab pro

!pip uninstall transformers -y
!pip install bitsandbytes
!pip install -q datasets loralib sentencepiece
!pip install transformers
!pip install -q git+https://github.com/huggingface/peft.git
from transformers import AutoTokenizer, AutoModelForCausalLM
import sys
import os
import torch
import torch.nn as nn
import bitsandbytes as bnb
from datasets import load_dataset
import transformers
from transformers import AutoTokenizer, AutoConfig
from peft import prepare_model_for_int8_training, LoraConfig, get_peft_model, get_peft_model_state_dict

MICRO_BATCH_SIZE = 4  # this could actually be 5 but i like powers of 2
BATCH_SIZE = 128
GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
EPOCHS = 3  # we don't need 3 tbh
LEARNING_RATE = 3e-4  # the Karpathy constant
CUTOFF_LEN = 256  # 256 accounts for about 96% of the data
LORA_R = 8
LORA_ALPHA = 16
LORA_DROPOUT = 0.05
VAL_SET_SIZE=2000

model_path = "./" # You can modify the path for storing the local model
model =  AutoModelForCausalLM.from_pretrained(
    "BelleGroup/BELLE-7B-2M",
    load_in_8bit=True,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("BelleGroup/BELLE-7B-2M", add_eos_token=True)
print("Human:")
line = input()
while line:
        inputs = 'Human: ' + line.strip() + '\n\nAssistant:'
        input_ids = tokenizer(inputs, return_tensors="pt").input_ids
        input_ids = input_ids.to(model.device)
        outputs = model.generate(input_ids, max_new_tokens=200, do_sample = True, top_k = 30, top_p = 0.85, temperature = 0.35, repetition_penalty=1.2)
        rets = tokenizer.batch_decode(outputs, skip_special_tokens=True)
        print("Assistant:\n" + rets[0].strip().replace(inputs, ""))
        print("\n------------------------------------------------\nHuman:")
        line = input()

temberature avatar Mar 22 '23 14:03 temberature

模型进一步训练的化,需要多少卡?8张A100 42G显存够不够?

TestNLP avatar Mar 23 '23 07:03 TestNLP

可以按照这个算一下: https://huggingface.co/docs/transformers/perf_train_gpu_one#anatomy-of-models-memory

模型进一步训练的化,需要多少卡?8张A100 42G显存够不够?

Degfy avatar Mar 23 '23 10:03 Degfy

模型进一步训练的化,需要多少卡?8张A100 42G显存够不够?

够了,建议使用deepspeed stage 2以上

barius avatar Mar 23 '23 14:03 barius

llama训练时候batchsize我记得很大. 几m. finetuen时候bs变小会不会让模型训飞了.

zhangbo2008 avatar Apr 03 '23 07:04 zhangbo2008