DeepSpeed
DeepSpeed copied to clipboard
[BUG] when I tracking the FLOPs by FlopsProfiler, the FLOPs become greater and greater?
the code is :
(package version: transformers==4.21.1 torch==1.11.0 deepspeed==0.6.5 cuda==11.3 GPU==RTX3090)
import torch
from transformers import BertTokenizer, BartForConditionalGeneration, BertModel, BertLMHeadModel
from transformers.activations import GELUActivation
from deepspeed.profiling.flops_profiler import FlopsProfiler
from torch.optim import SGD
import datetime as dt
model = BertLMHeadModel.from_pretrained('/workspace/data/model/bert-base-uncased').cuda()
param_list = [p for p in model.parameters(recurse=True)]
params_num = sum([p.nelement() for p in param_list])
print('params_num: %sy' % (params_num / 1e8))
params_config = [{'params': model.parameters(recurse=True), 'lr': 1e-5}]
opt = SGD(params=params_config, lr=1e-5)
input_ids = torch.randint(0, 10000, (12, 384)).long().cuda()
labels = torch.randint(0, 10000, (12, 384)).long().cuda()
ds_prof = FlopsProfiler(model)
for i in range(100):
print('%s ========' % i)
ds_prof.start_profile()
bt = dt.datetime.now()
loss = model(input_ids=input_ids, labels=labels).loss
loss.backward()
opt.step()
opt.zero_grad()
et = dt.datetime.now()
ds_prof.stop_profile()
flops_ds = ds_prof.get_total_flops(as_string=False)
params_ds = ds_prof.get_total_params(as_string=False)
ds_prof.end_profile()
print('flops: %sy' % (flops_ds / 1e8))
print('params: %sy' % (params_ds / 1e8))
print('duration: %s' % (et - bt).total_seconds())
print()
and the output is :
0 ========
flops: 10701.69587712y
params: 1.09514298y
duration: 2.365325
1 ========
flops: 10706.29650432y
params: 1.09514298y
duration: 0.220434
2 ========
flops: 11363.1952896y
params: 1.09514298y
duration: 0.216231
3 ========
flops: 12020.09407488y
params: 1.09514298y
duration: 0.223515
4 ========
flops: 12676.99286016y
params: 1.09514298y
duration: 0.243116
5 ========
flops: 13333.89164544y
params: 1.09514298y
duration: 0.229761
6 ========
flops: 13990.79043072y
params: 1.09514298y
duration: 0.236776
7 ========
flops: 14647.689216y
params: 1.09514298y
duration: 0.252339
8 ========
flops: 15304.58800128y
params: 1.09514298y
duration: 0.292576
9 ========
flops: 15961.48678656y
params: 1.09514298y
duration: 0.250491
10 ========
flops: 16618.38557184y
params: 1.09514298y
duration: 0.256977
11 ========
flops: 17275.28435712y
params: 1.09514298y
duration: 0.29402
……
why is the flops greater and greater?
I have the same issue. It is not just flops, but also macs. module.__flops__
and module.__macs__
are calculated in post hook in profiler code:
https://github.com/microsoft/DeepSpeed/blob/80f94c10c552ec79473775adb8902b210656ed76/deepspeed/profiling/flops_profiler/profiler.py#L91-L95
module_flop_count[-1]
and module_mac_count[-1]
have more and more profiled data as profiling proceeds, resulting in growing flops and macs.
I am using HuggingFace GPT2Model for profiling, and flops and macs keep increasing even after applying the patch from #2106 .
Hi @GongCQ AND @insujang, this is expected if start_profile and end_profile are not set correctly. the profiler is supposed to be started and ended on a single training step (in you code, you are accumulating catpured flops across i
steps). Please refer to the example here. https://github.com/microsoft/DeepSpeed/tree/master/deepspeed/profiling/flops_profiler#example-training-workflow.