maxtext icon indicating copy to clipboard operation
maxtext copied to clipboard

Megatron style TFLOPs Calculation

Open abhinavgoel95 opened this issue 3 months ago • 2 comments

@rwitten this is a draft.

This type of change would be specific to a few transformer models (e.g., Gemma, LLama, GPT, etc.). It wouldn't work with MoE, or some new architectures.

I was thinking that walking through the train-step and calculating the FLOPs layer-by-layer would be a very intrusive change.

What do you think?

abhinavgoel95 avatar Mar 20 '24 03:03 abhinavgoel95

Made the changes as requested in the meeting @rwitten

abhinavgoel95 avatar Apr 01 '24 16:04 abhinavgoel95

cc @rwitten following up on this

abhinavgoel95 avatar Apr 17 '24 22:04 abhinavgoel95