MIOpen
MIOpen copied to clipboard
Rather low training performance on "AI Benchmark" with MI100
Hi,
Just to check if I set up my machine with a MI100 GPU correctly I ran the "AI Benchmark" from https://ai-benchmark.com/ranking_deeplearning_detailed.html . The inference speed is pretty good, but the training one is for some sub-benchmarks quite far from where I would imagine it could be.
Installation instructions are at https://ai-benchmark.com/alpha.html .
Results I get:
>> AI-Benchmark-v.0.1.2
>> Let the AI Games begin..
* TF Version: 2.11.0
* Platform: Linux-5.15.0-67-generic-x86_64-with-glibc2.35
* CPU: N/A
* CPU RAM: 31 GB
* GPU/0: AMD Instinct MI100
* GPU RAM: 31.0 GB
* CUDA Version: N/A
* CUDA Build: N/A
The benchmark is running...
The tests might take up to 20 minutes
Please don't interrupt the script
1/19. MobileNet-V2
1.1 - inference | batch=50, size=224x224: 33.0 ± 1.1 ms
1.2 - training | batch=50, size=224x224: 2560 ± 12 ms
2/19. Inception-V3
2.1 - inference | batch=20, size=346x346: 39.8 ± 0.9 ms
2.2 - training | batch=20, size=346x346: 1989 ± 9 ms
3/19. Inception-V4
3.1 - inference | batch=10, size=346x346: 40.3 ± 2.1 ms
3.2 - training | batch=10, size=346x346: 1482 ± 4 ms
4/19. Inception-ResNet-V2
4.1 - inference | batch=10, size=346x346: 53.0 ± 4.4 ms
4.2 - training | batch=8, size=346x346: 950 ± 9 ms
5/19. ResNet-V2-50
5.1 - inference | batch=10, size=346x346: 25.5 ± 1.1 ms
5.2 - training | batch=10, size=346x346: 66.5 ± 0.9 ms
6/19. ResNet-V2-152
6.1 - inference | batch=10, size=256x256: 35.0 ± 2.2 ms
6.2 - training | batch=10, size=256x256: 108 ± 6 ms
7/19. VGG-16
7.1 - inference | batch=20, size=224x224: 55.9 ± 0.9 ms
7.2 - training | batch=2, size=224x224: 71.1 ± 0.7 ms
8/19. SRCNN 9-5-5
8.1 - inference | batch=10, size=512x512: 50.7 ± 0.6 ms
8.2 - inference | batch=1, size=1536x1536: 44.8 ± 0.7 ms
8.3 - training | batch=10, size=512x512: 111 ± 2 ms
9/19. VGG-19 Super-Res
9.1 - inference | batch=10, size=256x256: 52.0 ± 0.9 ms
9.2 - inference | batch=1, size=1024x1024: 86.9 ± 0.7 ms
9.3 - training | batch=10, size=224x224: 110.3 ± 0.7 ms
10/19. ResNet-SRGAN
10.1 - inference | batch=10, size=512x512: 64.4 ± 0.8 ms
10.2 - inference | batch=1, size=1536x1536: 60.4 ± 0.9 ms
10.3 - training | batch=5, size=512x512: 90.0 ± 1.3 ms
11/19. ResNet-DPED
11.1 - inference | batch=10, size=256x256: 63.2 ± 0.7 ms
11.2 - inference | batch=1, size=1024x1024: 110.4 ± 0.8 ms
11.3 - training | batch=15, size=128x128: 93.8 ± 0.9 ms
12/19. U-Net
12.1 - inference | batch=4, size=512x512: 108.6 ± 0.7 ms
12.2 - inference | batch=1, size=1024x1024: 118.6 ± 0.7 ms
12.3 - training | batch=4, size=256x256: 142.5 ± 0.9 ms
13/19. Nvidia-SPADE
13.1 - inference | batch=5, size=128x128: 53.1 ± 0.9 ms
13.2 - training | batch=1, size=128x128: 75.2 ± 3.0 ms
14/19. ICNet
14.1 - inference | batch=5, size=1024x1536: 161 ± 3 ms
14.2 - training | batch=10, size=1024x1536: 426 ± 10 ms
15/19. PSPNet
15.1 - inference | batch=5, size=720x720: 290.8 ± 0.8 ms
15.2 - training | batch=1, size=512x512: 393 ± 2 ms
16/19. DeepLab
16.1 - inference | batch=2, size=512x512: 62.0 ± 1.3 ms
16.2 - training | batch=1, size=384x384: 129 ± 6 ms
17/19. Pixel-RNN
17.1 - inference | batch=50, size=64x64: 496 ± 16 ms
17.2 - training | batch=10, size=64x64: 2764 ± 98 ms
18/19. LSTM-Sentiment
18.1 - inference | batch=100, size=1024x300: 591 ± 27 ms
18.2 - training | batch=10, size=1024x300: 1589 ± 195 ms
19/19. GNMT-Translation
19.1 - inference | batch=1, size=1x20: 197 ± 6 ms
Device Inference Score: 17625
Device Training Score: 9906
Device AI Score: 27531
For more information and results, please visit http://ai-benchmark.com/alpha
I installed the miopen kernels for gfx908 through the packaging manager, I am on Ubuntu 22.04.2 LTS, rocm 5.4.3, tensorflow 2.11 .
I would appreciate if you could indicate that it is the performance I should get as of now, or with some tuning it could be better. Given the training scores are not that great compared to the inference ones, I feel like there is something wrong and it should be better.
Best regards, Epliz