torchtitan
torchtitan copied to clipboard
[405B] Add performance data for 405B model
In this PR, we mostly measured the performance and loss curves for 405B model with some optimizations techniques we recently developed. We also want to log the actual peak TFLOPs used for MFU calculation for cross-validation. Also we are using a wrong peak flops for H100 MFU calculations because we are using H100 NVL machine which has a higher memory.