maxtext Megatron style TFLOPs Calculation

Megatron style TFLOPs Calculation

Open abhinavgoel95 opened this issue 11 months ago • 2 comments

@rwitten this is a draft.

This type of change would be specific to a few transformer models (e.g., Gemma, LLama, GPT, etc.). It wouldn't work with MoE, or some new architectures.

I was thinking that walking through the train-step and calculating the FLOPs layer-by-layer would be a very intrusive change.

What do you think?

Mar 20 '24 03:03 abhinavgoel95

Made the changes as requested in the meeting @rwitten

Apr 01 '24 16:04 abhinavgoel95

cc @rwitten following up on this

Apr 17 '24 22:04 abhinavgoel95

maxtext maxtext copied to clipboard

Megatron style TFLOPs Calculation

maxtext
maxtext copied to clipboard