DeepSpeedExamples The inaccurate flop results after several rounds

The inaccurate flop results after several rounds

Open BitCalSaul opened this issue 1 year ago • 1 comments

Hi I tried to use the method "get_model_profile" to get the latency and flop for my model. To get avoid of the influence from randomness, I used this method in a for loop for several times, and then an average operation would be done. However, I found the results for the following rounds of the first one are not correct, which is far away from the theoritical result. As shown in the fig below, you could see the flops is increasing with the round, which is not correct, since I gave the same size of input into the model. And this is the code:

def test_model(model, input_shape, warmup=20, num_tests=1000):
    results = []
    
    for _ in range(num_tests):
        #from profiler import get_model_profile
        flops, macs, params, latency = profiler.get_model_profile(
            model=model,
            input_shape=input_shape,
            print_profile=False,
            detailed=True,
            module_depth=-1,
            top_modules=1,
            warm_up=warmup,
            as_string=False
        )
        del sys.modules['profiler']
        results.append((flops/10**9, macs/10**9, params/10**3, latency*10**3))

    df = pd.DataFrame(results, columns=['FLOPs', 'MACs', 'Params', 'Latency'])
    return df

df_swin = test_model(Swin, (batch_size, math.prod(input_resolution), dim), warmup=warmup, num_tests=num_tests)

I tried to modify this code, and found if I could assign the model again in a different iteration with the profiler imported again, then the result is correct, shown in the fig below.

And the following is the modified code.

def test_model(input_shape, warmup=20, num_tests=1000):
    results = []
    for _ in range(num_tests):
        #from profiler import get_model_profile
        import profiler
        model = MySwinTransformerModel(dim, input_resolution, num_heads, window_size, mlp_ratio, depth).to(device) 
        # model = MyTensorizedTransformerModel(dim, input_resolution, num_heads, n_proj, mlp_ratio, depth).to(device) 
        flops, macs, params, latency = profiler.get_model_profile(
            model=model,
            input_shape=input_shape,
            print_profile=False,
            detailed=True,
            module_depth=-1,
            top_modules=1,
            warm_up=warmup,
            as_string=False
        )
        del sys.modules['profiler']
        results.append((flops/10**9, macs/10**9, params/10**3, latency*10**3))

    df = pd.DataFrame(results, columns=['FLOPs', 'MACs', 'Params', 'Latency'])
    return df

Jan 19 '24 03:01 BitCalSaul

I'm using this commit from https://github.com/KimmiShi/DeepSpeed/tree/flops_profiler_attn since I want to get flops for @ operation in transformer-based models, which the released version doesn't have.

Jan 19 '24 03:01 BitCalSaul

DeepSpeedExamples DeepSpeedExamples copied to clipboard

The inaccurate flop results after several rounds

DeepSpeedExamples
DeepSpeedExamples copied to clipboard