iree
iree copied to clipboard
[DNS] Test tile size selection
Abbreviated Benchmark Summary
@ commit d19f2e43a24d1425dcb4fa4537295af0f7e613fa (vs. base 21d01535d2154216731196e12508b12fa3b8a095)
Data-Tiling Comparison Table
Click to show
| Name | No-DT (baseline) | DT-Only | DT-UK |
|---|---|---|---|
| BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 1522.790 (1.0X) | N/A | 353.213 (4.3X) |
| DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 6.412 (1.0X) | N/A | 9.388 (0.7X) |
| EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 51.261 (1.0X) | N/A | 52.567 (1.0X) |
| EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 5.760 (1.0X) | N/A | 5.953 (1.0X) |
| Falcon7bInt4GptqPT(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 36107.850 (1.0X) | N/A | 7653.113 (4.7X) |
| GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 14.190 (1.0X) | N/A | 10.100 (1.4X) |
| MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 17.708 (1.0X) | N/A | 14.414 (1.2X) |
| MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 45.386 (1.0X) | N/A | 59.880 (0.8X) |
| MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 46.609 (1.0X) | N/A | 61.058 (0.8X) |
| MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 110.948 (1.0X) | N/A | 59.364 (1.9X) |
| MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 6.910 (1.0X) | N/A | 5.085 (1.4X) |
| MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 3.706 (1.0X) | N/A | 5.405 (0.7X) |
| MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 5.739 (1.0X) | N/A | 5.380 (1.1X) |
| MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 2.903 (1.0X) | N/A | 3.209 (0.9X) |
| MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 8.469 (1.0X) | N/A | 10.479 (0.8X) |
| PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 0.713 (1.0X) | N/A | 0.590 (1.2X) |
| PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 4.598 (1.0X) | N/A | 5.682 (0.8X) |
| BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 380.354 (1.0X) | N/A | 174.427 (2.2X) |
| DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 26.151 (1.0X) | N/A | 33.277 (0.8X) |
| EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 263.001 (1.0X) | N/A | 238.269 (1.1X) |
| EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 26.156 (1.0X) | N/A | 15.212 (1.7X) |
| Falcon7bGptqPT(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 35749.724 (1.0X) | N/A | 7514.987 (4.8X) |
| GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 16.861 (1.0X) | N/A | 10.542 (1.6X) |
| MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 75.352 (1.0X) | N/A | 57.885 (1.3X) |
| MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 180.472 (1.0X) | N/A | 190.106 (0.9X) |
| MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 183.020 (1.0X) | N/A | 194.098 (0.9X) |
| MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 484.107 (1.0X) | N/A | 210.046 (2.3X) |
| MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 26.074 (1.0X) | N/A | 19.829 (1.3X) |
| MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 11.082 (1.0X) | N/A | 13.086 (0.8X) |
| MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 21.070 (1.0X) | N/A | 13.034 (1.6X) |
| MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] | 2.873 (1.0X) | N/A | 3.144 (0.9X) |
| MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 32.363 (1.0X) | N/A | 33.996 (1.0X) |
| PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] | 0.643 (1.0X) | N/A | 0.533 (1.2X) |
| PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 17.846 (1.0X) | N/A | 20.786 (0.9X) |
| DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] | 60.099 (1.0X) | N/A | 37.395 (1.6X) |
| DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 61.573 (1.0X) | N/A | 36.635 (1.7X) |
| DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 38.389 (1.0X) | N/A | 24.171 (1.6X) |
| GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] | 93.111 (1.0X) | N/A | 21.179 (4.4X) |
| GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 93.816 (1.0X) | N/A | 22.439 (4.2X) |
| GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 57.057 (1.0X) | N/A | 22.061 (2.6X) |
| GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] | 133.311 (1.0X) | N/A | 29.381 (4.5X) |
| GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 133.240 (1.0X) | N/A | 31.846 (4.2X) |
| GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 81.029 (1.0X) | N/A | 28.372 (2.9X) |
| MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] | 752.204 (1.0X) | N/A | 375.714 (2.0X) |
| MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 763.623 (1.0X) | N/A | 376.838 (2.0X) |
| MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 432.955 (1.0X) | N/A | 229.811 (1.9X) |
| MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] | 907.408 (1.0X) | N/A | 263.260 (3.4X) |
| MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 914.225 (1.0X) | N/A | 266.838 (3.4X) |
| MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 507.399 (1.0X) | N/A | 157.620 (3.2X) |
| Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] | 2250.364 (1.0X) | N/A | 835.165 (2.7X) |
| Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 2252.883 (1.0X) | N/A | 830.509 (2.7X) |
| Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 1229.292 (1.0X) | N/A | 475.837 (2.6X) |
Regressed Latencies 🚩
| Benchmark Name | Average Latency (ms) | Median Latency (ms) | Latency Standard Deviation (ms) |
|---|---|---|---|
| GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 57.057 (vs. 53.300, 7.05%↑) | 56.960 | 0.317 |
| GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 81.029 (vs. 75.779, 6.93%↑) | 80.798 | 0.863 |
Improved Latencies 🎉
| Benchmark Name | Average Latency (ms) | Median Latency (ms) | Latency Standard Deviation (ms) |
|---|---|---|---|
| EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] | 26.156 (vs. 29.516, 11.38%↓) | 26.073 | 0.248 |
| GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 31.846 (vs. 35.234, 9.62%↓) | 32.176 | 0.880 |
| DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] | 24.171 (vs. 26.461, 8.65%↓) | 24.494 | 0.827 |
[Top 3 out of 14 results showed]
Improved Total Dispatch Sizes 🎉
| Benchmark Name | Total Dispatch Size (bytes) |
|---|---|
| MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt,compile-stats] | 296312 (vs. 335032, 11.56%↓) |
| PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt,compile-stats] | 170328 (vs. 192216, 11.39%↓) |
| MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt,compile-stats] | 210680 (vs. 236392, 10.88%↓) |
[Top 3 out of 5 results showed]
For more information: