focus-android ~28ms regression to start up around 4/21, all types

Both COLD VIEW and COLD MAIN show a regression from 4/19 to 4/21. Due to the noise of previous results, it's not clear exactly which date the regression begins:

For COLD VIEW, we see:

4/19: 1.856s
4/20: 1.889s (-33ms)
4/21: 1.917s (-28ms; 61ms total)

For COLD MAIN:

4/19: 593ms
4/20: 597ms (-5ms)
4/21: 606ms (-9ms; 14ms total)

It seems like the regression occurred on either 4/20 or 4/21.

Apr 25 '22 21:04 mcomella

@mcomella Running python3 measure_start_up.py -c=25 --product=focus nightly cold_view_nav_start results.txt for two times on this commit I obtained two results with a 45ms difference between median values. I know that is normal to have different values but seems a pretty big difference. Is this expected?

==> results_v1.txt <==
{'max': 1810.0,
 'mean': 1652.24,
 'median': 1644.0,
 'min': 1607.0,
 'replicate_count': 25,
 'replicates': [1810.0, 1653.0, 1607.0, 1648.0, 1640.0, 1638.0, 1643.0, 1642.0,
                1623.0, 1634.0, 1627.0, 1652.0, 1650.0, 1619.0, 1644.0, 1651.0,
                1625.0, 1699.0, 1633.0, 1661.0, 1665.0, 1647.0, 1631.0, 1646.0,
                1718.0]}

==> results_v2.txt <==
{'max': 2131.0,
 'mean': 1758.04,
 'median': 1689.0,
 'min': 1619.0,
 'replicate_count': 25,
 'replicates': [1790.0, 1704.0, 1636.0, 1665.0, 1619.0, 1650.0, 1655.0, 1679.0,
                1671.0, 1633.0, 1655.0, 1689.0, 1625.0, 1632.0, 1658.0, 1859.0,
                2014.0, 1968.0, 1982.0, 1947.0, 1845.0, 1790.0, 1709.0, 1745.0,
                2131.0]}

Jun 06 '22 14:06 ionutbedregeanu

@ionutbedregeanu Which device did you run on? On fenix on a Moto G5, we see a typical range of about 20ms and, since runs aren't usually on the extremes, the difference between two runs is usually less – 10-15ms. I don't have a good intuition for focus.

To debug, I looked at your results by graphing them (./perf-tools/analyze_durations.py --graph results_v1.txt) and I noticed an interesting pattern. Here's v1:

And v2:

v2's last ~10 runs take longer than the average run in the rest of v2 and in v1. I wonder if something happened during the run – a few ideas I can think of are, did the device's screen turn off? Did your computer go to sleep? Did some state change in the app or device (e.g. it opted into an experiment, started from a different home screen, or ended on a different app screen)? Another possibility is that you're seeing the bimodal behavior we see in fenix's VIEW start up https://github.com/mozilla-mobile/fenix/issues/22144 but since that only happens on the last few results, that seems unlikely.

In any case, the pattern of the results of v2 seems atypical – I wonder if you'd continue seeing this if you tried running it again a few times.

Jun 07 '22 00:06 mcomella

Yeah, it seems that it was an isolated result. After another tests the results seems pretty much the same

Jun 07 '22 07:06 ionutbedregeanu

After some tests I notice a regression added by https://github.com/mozilla-mobile/focus-android/pull/6897, but more like a spike

Before 6897 pr:

==> results.txt <==
{'max': 1804.0,
 'mean': 1653.44,
 'median': 1644.0,
 'min': 1603.0,
 'replicate_count': 25,
 'replicates': [1804.0, 1635.0, 1645.0, 1644.0, 1638.0, 1634.0, 1617.0, 1621.0,
                1624.0, 1603.0, 1720.0, 1670.0, 1671.0, 1668.0, 1627.0, 1692.0,
                1644.0, 1626.0, 1651.0, 1658.0, 1619.0, 1663.0, 1643.0, 1662.0,
                1657.0]}

On 6897 pr:

==> results.txt <==
{'max': 2128.0,
 'mean': 1950.48,
 'median': 2041.0,
 'min': 1619.0,
 'replicate_count': 25,
 'replicates': [1760.0, 1654.0, 1619.0, 1648.0, 1622.0, 1771.0, 1860.0, 1952.0,
                1967.0, 1993.0, 2055.0, 2027.0, 2033.0, 2063.0, 2128.0, 2061.0,
                2070.0, 2048.0, 2069.0, 2054.0, 2064.0, 2077.0, 2041.0, 2057.0,
                2069.0]}

After 6897 pr:

==> results.txt <==
{'max': 1840.0,
 'mean': 1658.08,
 'median': 1643.0,
 'min': 1622.0,
 'replicate_count': 25,
 'replicates': [1840.0, 1639.0, 1642.0, 1629.0, 1643.0, 1657.0, 1622.0, 1633.0,
                1663.0, 1638.0, 1643.0, 1659.0, 1638.0, 1644.0, 1643.0, 1653.0,
                1686.0, 1667.0, 1687.0, 1643.0, 1652.0, 1662.0, 1699.0, 1636.0,
                1634.0]}

Also after two runs I noticed a regression added by https://github.com/mozilla-mobile/focus-android/pull/6911 Device: Pixel 3a Android 11, fresh install every time

==> results.txt <==
{'max': 2078.0,
 'mean': 1998.48,
 'median': 2047.0,
 'min': 1623.0,
 'replicate_count': 25,
 'replicates': [1830.0, 1656.0, 1623.0, 2024.0, 2047.0, 1999.0, 2017.0, 1936.0,
                1955.0, 2058.0, 2061.0, 2030.0, 2062.0, 2046.0, 2060.0, 2060.0,
                2067.0, 2058.0, 2078.0, 2074.0, 2034.0, 2057.0, 2051.0, 2025.0,
                2054.0]}

Jun 07 '22 12:06 ionutbedregeanu

I made another session of testing and git bisect in AC to find the main cause of regression mentioned above(https://github.com/mozilla-mobile/focus-android/pull/6911) and it results that this commit added the regression Command: python3 measure_start_up.py -c=50 --product=focus nightly cold_view_nav_start --force results.txt Device: Pixel 3a Android 11 Build: Focus Nightly(fresh build on every session of testing) Unfortunately I don't have a device with an android version < 11 and I can't set the permissions to capture the profile using the Debug API I will unassign this from me, maybe someone else can help

Jun 08 '22 16:06 ionutbedregeanu

The Firefox Profiler can capture start up from the time Gecko starts and onward: see these instructions. Since the regression comes from a GeckoView upgrade, there is a good chance we'd be able to capture the regression in that profile (unless it occurs while GeckoView is starting rather than afterwards).

Jun 08 '22 18:06 mcomella

@mcomella I followed the above instructions, but the comparator didn't find any differences between them, so probably the difference is while GeckoView is starting.

Tested scenario

Follow Manual configuration steps
Install a fresh build of Focus Nightly
Go to app home screen(now the device is available in Mozilla Profiler)
Start Recording from Mozilla Profiler
Navigate to wikipedia
Stop recording

Public profile before https://github.com/mozilla-mobile/focus-android/pull/6911

https://share.firefox.dev/3HaQzIq

Public profile after

https://share.firefox.dev/3H6swKm

Differences between profiles

https://profiler.firefox.com/compare/network-chart/?globalTrackOrder=340w2&hiddenGlobalTracks=0134&hiddenLocalTracksByPid=18342%20from%20profile%201-0~19652%20from%20profile%202-0&localTrackOrderByPid=18342%20from%20profile%201-0~19652%20from%20profile%202-0&profiles[]=https%3A%2F%2Fshare.firefox.dev%2F3HaQzIq&profiles[]=https%3A%2F%2Fshare.firefox.dev%2F3H6swKm&thread=2&v=6

Jun 09 '22 10:06 ionutbedregeanu

@ionutbedregeanu Our nightly runs only test cold start ups but this profile doesn't appear to be of a cold start up. In that case, you'd see the main thread maxed out for a few seconds at the start of the profile, like this profile: https://share.firefox.dev/3MBwwDY Make sure to force stop the app (in the Android UI or through adb) before launching.

Also, I noticed the profile was taken on a Pixel 3 – this is a pretty fast device so you'll probably have difficulty seeing a small difference like 28ms from the Moto G5 which our nightly tests run on. For context, 28ms is close to the smallest regression we can detect on that device. It would be preferable to take these profiles on a low-end device. fwiw, we want to change our reference device to the Galaxy A51, so you can use that if you happen to have one of those.

Jun 09 '22 17:06 mcomella

I already talked with the team to transfer me a device with a lower android version, so I will postpone this ticket until then. Or as I said above, if someone else can help feel free to take the ticket

Jun 10 '22 06:06 ionutbedregeanu

Unfortunately I don't have a device with an android version < 11 and I can't set the permissions to capture the profile using the Debug API

You don't need to prioritize this work but FYI I added instructions for taking profiles on Android 11: https://wiki.mozilla.org/Performance/Fenix/Debug_API#Usage_for_start_up:_Android_11_and_later

Jun 28 '22 21:06 mcomella

I built the startup trace for both scenarios, before and after https://github.com/mozilla-mobile/android-components/pull/12027 This is the report for both traces https://profiler.firefox.com/compare/calltree/?globalTrackOrder=0w2&profiles[]=https%3A%2F%2Fshare.firefox.dev%2F3uuMee0&profiles[]=https%3A%2F%2Fshare.firefox.dev%2F3RiXTGs&thread=0&v=6 But I don't fully understand the legend and how the report should be read to see the differences @mcomella Can you help me with this?

Jul 08 '22 07:07 ionutbedregeanu

I find the Profiler Compare View to be complex to navigate and I've only rarely found it more intuitive than by comparing two profiles by switching quickly between each profile in separate tabs. I set the profiles to the "Stack Chart" view and make sure they're set to a comparable range (e.g. pick a marker to end the range on). I might recommend trying that if the Profiler Compare is unintuitive.

For the Profiler Compare View, it looks like the link provided is comparing the same two profiles (I determined this by clicking each "main" track in the "Stack Chart" view and seeing that it didn't change). It might make more sense when you load in two separate profiles.

fwiw, @Amejia481 is also familiar with at least the first method if you have more questions.

I apologize for not responding sooner – I just saw this now.

Jul 12 '22 23:07 mcomella

(e.g. it opted into an experiment, started from a different home screen, or ended on a different app screen)

@mcomella not sure if the tests are taking into consideration that focus added a splash screen that is delaying startups by up to 2 seconds to be able to download experiments and run them when the app starts.

Should we disabled experiments for this tests (also in Fenix)? I'm not sure if they are really helping us to measure a real startup and could cause noise while testing?

Jul 13 '22 13:07 Amejia481

I tested this on a Samsung S7 with Android 8.0 with Power saving mode activated. The tests were done using at least 100 runs because there was a lot of variation between results and 25 or 50 runs would not give consistent/reliable mean/median values.

Following the steps from https://github.com/mozilla-mobile/perf-tools I tested with using

python3 measure_start_up.py -c=100 --product=focus nightly cold_view_nav_start

Based on the reported regression time range I focused my tests on Focus on the following PRs merged between April 20-22.

Results on Focus:

The following were used to have a good baseline:

Merge day	Revision #	Title	# runs	Max	Mean	Median	Min	Good	Graph
18	a8683cc	Update to AC 101.0.20220418190127	100	2013	1808	1818	1562	✅	link
19	463ed77	Import strings from android-l10n	100	2017	1800	1789	1568	✅	link
19	6eaa634	Update to AC 101.0.20220419190137	100	1984	1775	1791	1524	✅	link
20	9727268	Update to AC 101.0.20220420143047	100	2552	1807	1808	1537	✅	link

The next ones are in merge order

Merge day	Revision #	Title	# runs	Max	Mean	Median	Min	Good	Graph
21	3cd24bc	Update to AC 101.0.20220421143106	100	1997	1800	1809	1561	✅	link
21	e522505	Update to AC 101.0.20220421190113	150	2191	1881	1877	1564	❌	link
21	b68eb20	Update to AC 101.0.20220423143138	100	2189	1870	1848	1585	❌	link

The e5225054 changeset which updates the A-C version is the same @ionutbedregeanu found to have introduced the regression and the To note that that PR after which the regression in seen on Focus was opened on Apr 21, 2022, 10:20 PM GMT+3 so the GV update proposed as introducing the regression in A-S in https://github.com/mozilla-mobile/focus-android/issues/6923#issuecomment-1150124016 seems unlikely as there was another GV update on Apr 20 and even another GV update merged on Apr 21, 2022, 5:01 PM GMT+3.

I then moved to test on AC the PRS starting from Apr 20 but found the results to be inconclusive and so expanded the testing to cover Apr 19 also with this AC list of merges and got the following data:

Results on A-C:

Merge day	Revision #	Title	# runs	Max	Mean	Median	Min	Good	Graph
19	8905464	Update suggestions everytime SuggestionFetcher updates	100	2250	1882	1872	1528	?	link
19	e07544f	Update to GV 101.0.20220419093010	100	2104	1846	1871	1508	?	link
20	01012c1	Import strings from android-l10n	100	2197	1813	1830	1540	?	link
20	4687fa5	Update to GV 101.0.20220420094247	100	2146	1849	1857	1585	?	link
20	d8cc5fb	Webcompat interventions	100	2040	1872	1868	1613	?	link
20	1d0457a	Infer mime type	100	2230	1813	1845	1539	?	link
20	be4dbf1	Added features to support A-S	100	2277	1834	1856	1539	?	link
21	3d66c19	Update to GV 101.0.20220421094346	100	2098	1842	1854	1577	?	link
22	af3eaca	Support auto hiding edit action end	100	2283	1885	1877	1554	?	link
22	88156d1	Update to GV 101.0.20220422100351	100	2132	1839	1842	1564	?	link

TheA-C tests were done using https://github.com/mozilla-mobile/focus-android/commit/3cd24bc882d6d77f5b2ff8059535697008636136 as a base in Focus and involved executing clean in A-C followed by a rebuild of the Focus project. The above results on A-C seem inconclusive to me though. I'll check with Ionut and maybe try testing on another device. Note that in my tests I didn't see the new splashscreen, as expected as it was merged in May.

All my tests can be seen here.

Aug 05 '22 08:08 Mugurell

Saw that a similar regression occured on Fenix also which would confirm the issue stems from AS and potentially GV.

| | |

Spoke with @ionutbedregeanu about the next steps and concluded on me testing on AC on another personal device I have a Nokia 6.1 with Android 10.

Aug 05 '22 09:08 Mugurell

Started testing again this AC list of merges on a Nokia 6.1 with Android 10 and got the following data:

Results on A-C:

Merge day	Revision #	Title	# runs	Max	Mean	Median	Min	Good	Graph
18	f12e657	Import strings from android-l10n	100	4813	4713	4713	4625	?	link
18	9168dff	Update to GV 101.0.20220418091627	25	4802	4731	4737	4665	?	link
18	ac93f1f	Update compose to 1.1.1 and Kotlin to 1.6.10	50	4830	4738	4738	4768	?	link
19	890546	Update suggestions everytime SuggestionFetcher updates	50	4938	4820	4823	4697	?	link
...
22	88156d1	Update to GV 101.0.20220422100351	50	4907	4768	4757	4675	?	link

Which despite showing a lower variation between min and max it is still too high to allow for confidently identifying which changeset introduced the regression. Results can be seen in the same folder as the ones from above - https://drive.google.com/drive/u/1/folders/1CQ9MhWZUF4WEzJaGgW_WPolBGfkR3JM1

I can try testing using a Pixel 3 with Android 11 (same as Ionut) in the hope that although the times will be lower there will be less noise and a regression can be identified.

Aug 05 '22 15:08 Mugurell

Seeing that bisecting commits may give false results because of differences between runs higher than the reported regression I tested a block of commits around April 21st from this AC list of merges on a Pixel 3 with Android 11 and got the following data:

Results on A-C:

Merge day	Revision #	Title	# runs	Max	Mean	Median	Min	Good	Graph
19	e07544f	Update to GV 101.0.20220419093010	25	876	640	613	592	✅	link
20	01012c1	Import strings from android-l10n	25	874	657	623	597	✅	link
20	4687fa5	Update to GV 101.0.20220420094247	30	903	645	622	609	✅	link
20	d8cc5fb	Webcompat interventions	100	1011	660	630	600	✅	link
20	1d0457a	Infer mime type	30	909	640	612	590	✅	link
20	be4dbf1	Added features to support A-S	30	946	668	629	610	✅	link
21	f0ebd29	Import strings from android-l10n	100	898	626	608	588	✅	link
21	3d66c19	Update to GV 101.0.20220421094346	100	1292	730	722	590	❌	link
21	6d4ff98	Disable auto report on the AndroidManifest	100	1360	762	717	590	❌	link
...
22	88156d1	Update to GV 101.0.20220422100351	25	770	717	709	691	❌	link

Seems like the results are more consistent on this device and they indicate https://github.com/mozilla-mobile/android-components/pull/12043 introduced the regression.

Aug 09 '22 07:08 Mugurell

@mcomella Tried profiling the startup with using the steps from https://github.com/mozilla-mobile/focus-android/issues/6923#issuecomment-1169238181 but by starting the app with measure_start_up.py -c=1 --product=focus nightly cold_view_nav_start as an easy way to also measure the time needed since I suspected the inherent startup differences may be higher than the searched for regression.

And indeed I saw any numbers between 620ms and 1292ms so any taken profiles would not allow the confidence needed to confirm a regression in GV.

Wanted to confirm with you that at this point the next steps for investigating this and trusting the results would be using measure_start_up.py --product=geckoview nightly cold_view_nav_start (the script would need minor modifications). I don't have now the setup to build GeckoView locally so I would report the regression to Bugzilla and leave the team there continue the investigation.

Aug 09 '22 14:08 Mugurell

@Mugurell From your previous comment https://github.com/mozilla-mobile/focus-android/issues/6923#issuecomment-1209002708, we can conclude the regression occurred due to a GV update. As you point out, the easiest way to find the root cause of the regression would be to bisect the commits to GeckoView (either by changing the version of GV in fenix or with GVE). Unfortunately, the tools don't make this easy currently. The second easiest method is usually to take profiles.

I don't have now the setup to build GeckoView locally so I would report the regression to Bugzilla and leave the team there continue the investigation.

This makes sense to me but it depends on how the GV team and the fenix team agree to handle regressions in GV. If you file an issue in Bugzilla, please add the "Performance: ?" flag.

And indeed I saw any numbers between 620ms and 1292ms so any taken profiles would not allow the confidence needed to confirm a regression in GV.

You generally want to compare a profile before the regression and after the regression. If there is too much variance in duration such that they're incomparable, one technique I use is to take several profiles for each scenario (5? More?), take the shortest for each (e.g. I often set the track to the first doOnPreDraw marker and compare the length at the top left), and compare those. While this isn't perfect, you're generally getting the run with minimal noise for each scenario which is more comparable than two runs with arbitrary amounts of noise.

Aug 12 '22 21:08 mcomella

Sorry for my slow update here, my trusty Pixel 3 had to be sent for repair.

Based on the Focus data from https://github.com/mozilla-mobile/focus-android/issues/6923#issuecomment-1206192477 I followed the steps from https://profiler.firefox.com/docs/#/./guide-remote-profiling and https://profiler.firefox.com/docs/#/./guide-startup-shutdown?id=firefox-for-android to profile the startup of official Focus Nightly builds (arm64-v8a) on a Pixel 3 with Android 12.

At a quick glance the data is not consistent enough, that's why I've taken multiple profiles following the same steps:

with "Remote debugging via USB/Wi-Fi" disabled
with the latest Release (107.0) of Firefox for Mac installed and about:debugging opened
adb shell am start-activity -d "https://www.mozilla.org/" -a android.intent.action.VIEW org.mozilla.focus.nightly
In the newly opened app I then enabled "Remote debugging via USB/Wi-Fi"
connected to the app in about:debugging on Desktop
clicked "Start recording" -> "Capture profile"

Results:

for the "good" build	for the "bad" build
https://share.firefox.dev/3tSmGXy	https://share.firefox.dev/3ET2c76
https://share.firefox.dev/3UX238g	https://share.firefox.dev/3GyDfiF
https://share.firefox.dev/3Ov5eBz	https://share.firefox.dev/3TTa5hj
https://share.firefox.dev/3tXOLMN	https://share.firefox.dev/3U078LU
https://share.firefox.dev/3tPzIoH	https://share.firefox.dev/3OqVUin

@mcomella @MarcLeclair Please advise.

Nov 22 '22 16:11 Mugurell

Moved to bugzilla: https://bugzilla.mozilla.org/show_bug.cgi?id=1802642

Change performed by the Move to Bugzilla add-on.

Nov 26 '22 05:11 cpeterso

focus-android focus-android copied to clipboard

~28ms regression to start up around 4/21, all types

Before 6897 pr:

On 6897 pr:

After 6897 pr:

Tested scenario

Public profile before https://github.com/mozilla-mobile/focus-android/pull/6911

Public profile after

Differences between profiles

Results on Focus:

Results on A-C:

Results on A-C:

Results on A-C:

focus-android
focus-android copied to clipboard