ml-engineering icon indicating copy to clipboard operation
ml-engineering copied to clipboard

Max Achievable TFLOP/s on H100 without warmup

Open OrenLeung opened this issue 6 months ago • 0 comments

As discussed on slack, since we are trying to find what the max FLOPs is for each accelerator. I changed warmup to 0.

Without any magic flags on nvidia drivers 550 with NGC 24.07 image, i get 813 TFLOP/s consistently.

Note that this script is just trying to find the best iter and not the mean/median

OrenLeung avatar Aug 09 '24 02:08 OrenLeung