DeepSpeedExamples
DeepSpeedExamples copied to clipboard
The inaccurate flop results after several rounds
Hi I tried to use the method "get_model_profile" to get the latency and flop for my model. To get avoid of the influence from randomness, I used this method in a for loop for several times, and then an average operation would be done.
However, I found the results for the following rounds of the first one are not correct, which is far away from the theoritical result. As shown in the fig below, you could see the flops is increasing with the round, which is not correct, since I gave the same size of input into the model.
And this is the code:
def test_model(model, input_shape, warmup=20, num_tests=1000):
results = []
for _ in range(num_tests):
#from profiler import get_model_profile
flops, macs, params, latency = profiler.get_model_profile(
model=model,
input_shape=input_shape,
print_profile=False,
detailed=True,
module_depth=-1,
top_modules=1,
warm_up=warmup,
as_string=False
)
del sys.modules['profiler']
results.append((flops/10**9, macs/10**9, params/10**3, latency*10**3))
df = pd.DataFrame(results, columns=['FLOPs', 'MACs', 'Params', 'Latency'])
return df
df_swin = test_model(Swin, (batch_size, math.prod(input_resolution), dim), warmup=warmup, num_tests=num_tests)
I tried to modify this code, and found if I could assign the model again in a different iteration with the profiler imported again, then the result is correct, shown in the fig below.
And the following is the modified code.
def test_model(input_shape, warmup=20, num_tests=1000):
results = []
for _ in range(num_tests):
#from profiler import get_model_profile
import profiler
model = MySwinTransformerModel(dim, input_resolution, num_heads, window_size, mlp_ratio, depth).to(device)
# model = MyTensorizedTransformerModel(dim, input_resolution, num_heads, n_proj, mlp_ratio, depth).to(device)
flops, macs, params, latency = profiler.get_model_profile(
model=model,
input_shape=input_shape,
print_profile=False,
detailed=True,
module_depth=-1,
top_modules=1,
warm_up=warmup,
as_string=False
)
del sys.modules['profiler']
results.append((flops/10**9, macs/10**9, params/10**3, latency*10**3))
df = pd.DataFrame(results, columns=['FLOPs', 'MACs', 'Params', 'Latency'])
return df
I'm using this commit from https://github.com/KimmiShi/DeepSpeed/tree/flops_profiler_attn since I want to get flops for @ operation in transformer-based models, which the released version doesn't have.