flutter
flutter copied to clipboard
[ios] Device lab even A/A test is not accurate in frame render time
Use case
As a sanity check, I ran an A/A test and got this result:
═════════════════════════╡ ••• Final A/B results ••• ╞══════════════════════════
Score Average A (noise) Average B (noise) Speed-up
average_frame_build_time_millis 0.44 (0.00%) 0.49 (0.00%) 0.89x
worst_frame_build_time_millis 3.21 (0.00%) 1.93 (0.00%) 1.66x
90th_percentile_frame_build_time_millis 0.71 (0.00%) 0.87 (0.00%) 0.82x
99th_percentile_frame_build_time_millis 1.02 (0.00%) 1.49 (0.00%) 0.68x
average_frame_rasterizer_time_millis 5.50 (0.00%) 6.31 (0.00%) 0.87x
worst_frame_rasterizer_time_millis 15.90 (0.00%) 18.48 (0.00%) 0.86x
90th_percentile_frame_rasterizer_time_millis 7.91 (0.00%) 8.89 (0.00%) 0.89x
99th_percentile_frame_rasterizer_time_millis 13.24 (0.00%) 17.05 (0.00%) 0.78x
average_layer_cache_count 0.00 (0.00%) 0.00 (0.00%) NaNx
90th_percentile_layer_cache_count 0.00 (0.00%) 0.00 (0.00%) NaNx
99th_percentile_layer_cache_count 0.00 (0.00%) 0.00 (0.00%) NaNx
worst_layer_cache_count 0.00 (0.00%) 0.00 (0.00%) NaNx
average_layer_cache_memory 0.00 (0.00%) 0.00 (0.00%) NaNx
90th_percentile_layer_cache_memory 0.00 (0.00%) 0.00 (0.00%) NaNx
99th_percentile_layer_cache_memory 0.00 (0.00%) 0.00 (0.00%) NaNx
worst_layer_cache_memory 0.00 (0.00%) 0.00 (0.00%) NaNx
average_picture_cache_count 0.00 (0.00%) 0.00 (0.00%) NaNx
90th_percentile_picture_cache_count 0.00 (0.00%) 0.00 (0.00%) NaNx
99th_percentile_picture_cache_count 0.00 (0.00%) 0.00 (0.00%) NaNx
worst_picture_cache_count 0.00 (0.00%) 0.00 (0.00%) NaNx
average_picture_cache_memory 0.00 (0.00%) 0.00 (0.00%) NaNx
90th_percentile_picture_cache_memory 0.00 (0.00%) 0.00 (0.00%) NaNx
99th_percentile_picture_cache_memory 0.00 (0.00%) 0.00 (0.00%) NaNx
worst_picture_cache_memory 0.00 (0.00%) 0.00 (0.00%) NaNx
old_gen_gc_count 0.00 (0.00%) 0.00 (0.00%) NaNx
average_vsync_transitions_missed 1.00 (0.00%) 1.00 (0.00%) 1.00x
90th_percentile_vsync_transitions_missed 1.00 (0.00%) 1.00 (0.00%) 1.00x
99th_percentile_vsync_transitions_missed 1.00 (0.00%) 1.00 (0.00%) 1.00x
average_frame_request_pending_latency 11134.70 (0.00%) 10870.48 (0.00%) 1.02x
90th_percentile_frame_request_pending_latency 16426.00 (0.00%) 16514.00 (0.00%) 0.99x
99th_percentile_frame_request_pending_latency 16687.00 (0.00%) 16802.00 (0.00%) 0.99x
average_cpu_usage 23.42 (0.00%) 24.69 (0.00%) 0.95x
average_gpu_usage 9.06 (0.00%) 7.24 (0.00%) 1.25x
average_memory_usage 99.29 (0.00%) 124.23 (0.00%) 0.80x
90th_percentile_memory_usage 100.86 (0.00%) 125.78 (0.00%) 0.80x
99th_percentile_memory_usage 101.47 (0.00%) 126.06 (0.00%) 0.80x
total_ui_gc_time 2.83 (0.00%) 2.92 (0.00%) 0.97x
30hz_frame_percentage 0.00 (0.00%) 0.00 (0.00%) NaNx
60hz_frame_percentage 100.00 (0.00%) 100.00 (0.00%) 1.00x
80hz_frame_percentage 0.00 (0.00%) 0.00 (0.00%) NaNx
90hz_frame_percentage 0.00 (0.00%) 0.00 (0.00%) NaNx
120hz_frame_percentage 0.00 (0.00%) 0.00 (0.00%) NaNx
illegal_refresh_rate_frame_count 0.00 (0.00%) 0.00 (0.00%) NaNx
average_gpu_frame_time 1.03 (0.00%) 1.42 (0.00%) 0.73x
90th_percentile_gpu_frame_time 0.00 (0.00%) 0.00 (0.00%) NaNx
99th_percentile_gpu_frame_time 15.63 (0.00%) 15.63 (0.00%) 1.00x
worst_gpu_frame_time 15.63 (0.00%) 15.63 (0.00%) 1.00x
Take average_frame_build_time_millis and average_frame_rasterizer_time_millis as an example, there's a significant difference, even tho they are running the same code (A/A test).
@hellohuanlin from my experience this is about as good as you can do, even carefully design benchmarks have variations - so you need to use a mixture of approaches:
- one (or more) benchmarks that can show a gradual improvement over time.
- Local profiles that can reproduce some work not being completed
- Luck.
For example https://flutter-flutter-perf.skia.org/e/?begin=1700807779&end=1708055000&queries=device_type%3DPixel_7_Pro%26sub_result%3D90th_percentile_frame_rasterizer_time_millis%26sub_result%3Daverage_frame_rasterizer_time_millis%26test%3Dcomplex_layout_scroll_perf_impeller__timeline_summary&requestType=0&selected=commit%3D38255%26name%3D%252Carch%253Dintel%252Cbranch%253Dmaster%252Cconfig%253Ddefault%252Cdevice_type%253DPixel_7_Pro%252Cdevice_version%253Dnone%252Chost_type%253Dlinux%252Csub_result%253D90th_percentile_frame_rasterizer_time_millis%252Ctest%253Dcomplex_layout_scroll_perf_impeller__timeline_summary%252C
Even though there are some big jumps, most of the work is very gradual changes that show up over time.
I noticed that I only did a/b=2. The numbers make sense if i increase number of runs. So will close it.
This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.
re-openning and assign to our team as the issue is still valid.
I noticed that I only did
a/b=2. The numbers make sense if i increase number of runs. So will close it.
the issue is still valid
What work are you suggesting needs to be done for this?
Sorry I wanted to re-open another issue that was closed by the infra team. This issue was resolved.