ml-engineering
ml-engineering copied to clipboard
Max Achievable TFLOP/s on H100 without warmup
As discussed on slack, since we are trying to find what the max FLOPs is for each accelerator. I changed warmup to 0
.
Without any magic flags on nvidia drivers 550 with NGC 24.07 image, i get 813 TFLOP/s consistently.
Note that this script is just trying to find the best iter and not the mean/median