vidur Check whether the calculation of the 'add

I found the _get_block_execution_time() function in the vidur/entities/execution_time.py path computes 'add_time' only once.

def _get_block_execution_time(self) -> float:
    return (
        self._get_attention_layer_execution_time()
        + self._get_mlp_layer_execution_time()
        + self._add_time
    )

But in other places, for example, under the vidur/metrics/metrics_store.py path. 'add_time' is calculated twice

self._push_metric(
OperationMetrics.ADD, batch_id, execution_time.add_time * 2
)

Is this a bug?

Sep 11 '24 11:09 yipengLeo

@yipengLeo thanks for pointing this out, there should be two add operations per block in llama models as far as i remember, let me look into this

Sep 14 '24 07:09 AgrawalAmey

Hi @yipengLeo, there is indeed a bug here. However, the impact is relatively little as the add_time is a small percentage of the time.

For all the models supported right now except phi-2, there are two add operations in each layer. (only phi-2 has post_attn_norm: false in its model config) 1st add: https://github.com/microsoft/vidur/blob/main/vidur/profiling/mlp/mlp_impl.py#L172 2nd add: https://github.com/microsoft/vidur/blob/main/vidur/profiling/mlp/mlp_impl.py#L179

So, if we ignore phi-2, it would be better to add add_time twice. A proper solution would be to consider these slight differences between model architectures. Please feel free to raise a PR for fixing this!

Sep 18 '24 06:09 nitinkedia7

Check whether the calculation of the 'add_time' is incorrect?