Check whether the calculation of the 'add_time' is incorrect?
I found the _get_block_execution_time() function in the vidur/entities/execution_time.py path computes 'add_time' only once.
def _get_block_execution_time(self) -> float:
return (
self._get_attention_layer_execution_time()
+ self._get_mlp_layer_execution_time()
+ self._add_time
)
But in other places, for example, under the vidur/metrics/metrics_store.py path. 'add_time' is calculated twice
self._push_metric(
OperationMetrics.ADD, batch_id, execution_time.add_time * 2
)
Is this a bug?
@yipengLeo thanks for pointing this out, there should be two add operations per block in llama models as far as i remember, let me look into this
Hi @yipengLeo, there is indeed a bug here. However, the impact is relatively little as the add_time is a small percentage of the time.
For all the models supported right now except phi-2, there are two add operations in each layer.
(only phi-2 has post_attn_norm: false in its model config)
1st add: https://github.com/microsoft/vidur/blob/main/vidur/profiling/mlp/mlp_impl.py#L172
2nd add: https://github.com/microsoft/vidur/blob/main/vidur/profiling/mlp/mlp_impl.py#L179
So, if we ignore phi-2, it would be better to add add_time twice. A proper solution would be to consider these slight differences between model architectures. Please feel free to raise a PR for fixing this!