flutter icon indicating copy to clipboard operation
flutter copied to clipboard

[ios] Device lab even A/A test is not accurate in frame render time

Open hellohuanlin opened this issue 1 year ago • 1 comments

Use case

As a sanity check, I ran an A/A test and got this result:

═════════════════════════╡ ••• Final A/B results ••• ╞══════════════════════════

Score	Average A (noise)	Average B (noise)	Speed-up
average_frame_build_time_millis	0.44 (0.00%)	0.49 (0.00%)	0.89x	
worst_frame_build_time_millis	3.21 (0.00%)	1.93 (0.00%)	1.66x	
90th_percentile_frame_build_time_millis	0.71 (0.00%)	0.87 (0.00%)	0.82x	
99th_percentile_frame_build_time_millis	1.02 (0.00%)	1.49 (0.00%)	0.68x	
average_frame_rasterizer_time_millis	5.50 (0.00%)	6.31 (0.00%)	0.87x	
worst_frame_rasterizer_time_millis	15.90 (0.00%)	18.48 (0.00%)	0.86x	
90th_percentile_frame_rasterizer_time_millis	7.91 (0.00%)	8.89 (0.00%)	0.89x	
99th_percentile_frame_rasterizer_time_millis	13.24 (0.00%)	17.05 (0.00%)	0.78x	
average_layer_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
90th_percentile_layer_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
99th_percentile_layer_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
worst_layer_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
average_layer_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
90th_percentile_layer_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
99th_percentile_layer_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
worst_layer_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
average_picture_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
90th_percentile_picture_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
99th_percentile_picture_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
worst_picture_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
average_picture_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
90th_percentile_picture_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
99th_percentile_picture_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
worst_picture_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
old_gen_gc_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
average_vsync_transitions_missed	1.00 (0.00%)	1.00 (0.00%)	1.00x	
90th_percentile_vsync_transitions_missed	1.00 (0.00%)	1.00 (0.00%)	1.00x	
99th_percentile_vsync_transitions_missed	1.00 (0.00%)	1.00 (0.00%)	1.00x	
average_frame_request_pending_latency	11134.70 (0.00%)	10870.48 (0.00%)	1.02x	
90th_percentile_frame_request_pending_latency	16426.00 (0.00%)	16514.00 (0.00%)	0.99x	
99th_percentile_frame_request_pending_latency	16687.00 (0.00%)	16802.00 (0.00%)	0.99x	
average_cpu_usage	23.42 (0.00%)	24.69 (0.00%)	0.95x	
average_gpu_usage	9.06 (0.00%)	7.24 (0.00%)	1.25x	
average_memory_usage	99.29 (0.00%)	124.23 (0.00%)	0.80x	
90th_percentile_memory_usage	100.86 (0.00%)	125.78 (0.00%)	0.80x	
99th_percentile_memory_usage	101.47 (0.00%)	126.06 (0.00%)	0.80x	
total_ui_gc_time	2.83 (0.00%)	2.92 (0.00%)	0.97x	
30hz_frame_percentage	0.00 (0.00%)	0.00 (0.00%)	NaNx	
60hz_frame_percentage	100.00 (0.00%)	100.00 (0.00%)	1.00x	
80hz_frame_percentage	0.00 (0.00%)	0.00 (0.00%)	NaNx	
90hz_frame_percentage	0.00 (0.00%)	0.00 (0.00%)	NaNx	
120hz_frame_percentage	0.00 (0.00%)	0.00 (0.00%)	NaNx	
illegal_refresh_rate_frame_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
average_gpu_frame_time	1.03 (0.00%)	1.42 (0.00%)	0.73x	
90th_percentile_gpu_frame_time	0.00 (0.00%)	0.00 (0.00%)	NaNx	
99th_percentile_gpu_frame_time	15.63 (0.00%)	15.63 (0.00%)	1.00x	
worst_gpu_frame_time	15.63 (0.00%)	15.63 (0.00%)	1.00x	

Take average_frame_build_time_millis and average_frame_rasterizer_time_millis as an example, there's a significant difference, even tho they are running the same code (A/A test).

hellohuanlin avatar Feb 15 '24 19:02 hellohuanlin

@hellohuanlin from my experience this is about as good as you can do, even carefully design benchmarks have variations - so you need to use a mixture of approaches:

  1. one (or more) benchmarks that can show a gradual improvement over time.
  2. Local profiles that can reproduce some work not being completed
  3. Luck.

For example https://flutter-flutter-perf.skia.org/e/?begin=1700807779&end=1708055000&queries=device_type%3DPixel_7_Pro%26sub_result%3D90th_percentile_frame_rasterizer_time_millis%26sub_result%3Daverage_frame_rasterizer_time_millis%26test%3Dcomplex_layout_scroll_perf_impeller__timeline_summary&requestType=0&selected=commit%3D38255%26name%3D%252Carch%253Dintel%252Cbranch%253Dmaster%252Cconfig%253Ddefault%252Cdevice_type%253DPixel_7_Pro%252Cdevice_version%253Dnone%252Chost_type%253Dlinux%252Csub_result%253D90th_percentile_frame_rasterizer_time_millis%252Ctest%253Dcomplex_layout_scroll_perf_impeller__timeline_summary%252C

Even though there are some big jumps, most of the work is very gradual changes that show up over time.

jonahwilliams avatar Feb 16 '24 03:02 jonahwilliams

I noticed that I only did a/b=2. The numbers make sense if i increase number of runs. So will close it.

hellohuanlin avatar Feb 22 '24 23:02 hellohuanlin

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

github-actions[bot] avatar Mar 08 '24 00:03 github-actions[bot]

re-openning and assign to our team as the issue is still valid.

hellohuanlin avatar Mar 08 '24 00:03 hellohuanlin

I noticed that I only did a/b=2. The numbers make sense if i increase number of runs. So will close it.

the issue is still valid

What work are you suggesting needs to be done for this?

jmagman avatar Mar 08 '24 01:03 jmagman

Sorry I wanted to re-open another issue that was closed by the infra team. This issue was resolved.

hellohuanlin avatar Mar 08 '24 23:03 hellohuanlin