DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[BUG] when I tracking the FLOPs by FlopsProfiler, the FLOPs become greater and greater?

Open GongCQ opened this issue 2 years ago • 1 comments

the code is :

(package version: transformers==4.21.1 torch==1.11.0 deepspeed==0.6.5 cuda==11.3 GPU==RTX3090)

import torch 
from transformers import BertTokenizer, BartForConditionalGeneration, BertModel, BertLMHeadModel
from transformers.activations import GELUActivation
from deepspeed.profiling.flops_profiler import FlopsProfiler
from torch.optim import SGD
import datetime as dt 

model = BertLMHeadModel.from_pretrained('/workspace/data/model/bert-base-uncased').cuda()

param_list = [p for p in model.parameters(recurse=True)]
params_num = sum([p.nelement() for p in param_list])
print('params_num: %sy' % (params_num / 1e8))
params_config = [{'params': model.parameters(recurse=True), 'lr': 1e-5}]
opt = SGD(params=params_config, lr=1e-5)

input_ids = torch.randint(0, 10000, (12, 384)).long().cuda()
labels = torch.randint(0, 10000, (12, 384)).long().cuda()

ds_prof = FlopsProfiler(model)

for i in range(100):
    print('%s ========' % i)
    ds_prof.start_profile()

    bt = dt.datetime.now()
    loss = model(input_ids=input_ids, labels=labels).loss
    loss.backward()
    opt.step()
    opt.zero_grad()
    et = dt.datetime.now()

    ds_prof.stop_profile()
    flops_ds = ds_prof.get_total_flops(as_string=False)
    params_ds = ds_prof.get_total_params(as_string=False)
    ds_prof.end_profile()

    print('flops: %sy' % (flops_ds / 1e8))
    print('params: %sy' % (params_ds / 1e8))
    print('duration: %s' % (et - bt).total_seconds())
    print()

and the output is :

0 ========
flops: 10701.69587712y
params: 1.09514298y
duration: 2.365325

1 ========
flops: 10706.29650432y
params: 1.09514298y
duration: 0.220434

2 ========
flops: 11363.1952896y
params: 1.09514298y
duration: 0.216231

3 ========
flops: 12020.09407488y
params: 1.09514298y
duration: 0.223515

4 ========
flops: 12676.99286016y
params: 1.09514298y
duration: 0.243116

5 ========
flops: 13333.89164544y
params: 1.09514298y
duration: 0.229761

6 ========
flops: 13990.79043072y
params: 1.09514298y
duration: 0.236776

7 ========
flops: 14647.689216y
params: 1.09514298y
duration: 0.252339

8 ========
flops: 15304.58800128y
params: 1.09514298y
duration: 0.292576

9 ========
flops: 15961.48678656y
params: 1.09514298y
duration: 0.250491

10 ========
flops: 16618.38557184y
params: 1.09514298y
duration: 0.256977

11 ========
flops: 17275.28435712y
params: 1.09514298y
duration: 0.29402
……

why is the flops greater and greater?

GongCQ avatar Sep 05 '22 04:09 GongCQ

I have the same issue. It is not just flops, but also macs. module.__flops__ and module.__macs__ are calculated in post hook in profiler code:

https://github.com/microsoft/DeepSpeed/blob/80f94c10c552ec79473775adb8902b210656ed76/deepspeed/profiling/flops_profiler/profiler.py#L91-L95

module_flop_count[-1] and module_mac_count[-1] have more and more profiled data as profiling proceeds, resulting in growing flops and macs.

I am using HuggingFace GPT2Model for profiling, and flops and macs keep increasing even after applying the patch from #2106 .

insujang avatar Sep 11 '22 22:09 insujang

Hi @GongCQ AND @insujang, this is expected if start_profile and end_profile are not set correctly. the profiler is supposed to be started and ended on a single training step (in you code, you are accumulating catpured flops across i steps). Please refer to the example here. https://github.com/microsoft/DeepSpeed/tree/master/deepspeed/profiling/flops_profiler#example-training-workflow.

cli99 avatar Oct 25 '22 01:10 cli99