focus-android icon indicating copy to clipboard operation
focus-android copied to clipboard

~28ms regression to start up around 4/21, all types

Open mcomella opened this issue 3 years ago • 19 comments

Both COLD VIEW and COLD MAIN show a regression from 4/19 to 4/21. Due to the noise of previous results, it's not clear exactly which date the regression begins:

image

For COLD VIEW, we see:

  • 4/19: 1.856s
  • 4/20: 1.889s (-33ms)
  • 4/21: 1.917s (-28ms; 61ms total)

For COLD MAIN:

  • 4/19: 593ms
  • 4/20: 597ms (-5ms)
  • 4/21: 606ms (-9ms; 14ms total)

It seems like the regression occurred on either 4/20 or 4/21.

mcomella avatar Apr 25 '22 21:04 mcomella

@mcomella Running python3 measure_start_up.py -c=25 --product=focus nightly cold_view_nav_start results.txt for two times on this commit I obtained two results with a 45ms difference between median values. I know that is normal to have different values but seems a pretty big difference. Is this expected?

==> results_v1.txt <==
{'max': 1810.0,
 'mean': 1652.24,
 'median': 1644.0,
 'min': 1607.0,
 'replicate_count': 25,
 'replicates': [1810.0, 1653.0, 1607.0, 1648.0, 1640.0, 1638.0, 1643.0, 1642.0,
                1623.0, 1634.0, 1627.0, 1652.0, 1650.0, 1619.0, 1644.0, 1651.0,
                1625.0, 1699.0, 1633.0, 1661.0, 1665.0, 1647.0, 1631.0, 1646.0,
                1718.0]}

==> results_v2.txt <==
{'max': 2131.0,
 'mean': 1758.04,
 'median': 1689.0,
 'min': 1619.0,
 'replicate_count': 25,
 'replicates': [1790.0, 1704.0, 1636.0, 1665.0, 1619.0, 1650.0, 1655.0, 1679.0,
                1671.0, 1633.0, 1655.0, 1689.0, 1625.0, 1632.0, 1658.0, 1859.0,
                2014.0, 1968.0, 1982.0, 1947.0, 1845.0, 1790.0, 1709.0, 1745.0,
                2131.0]}

ionutbedregeanu avatar Jun 06 '22 14:06 ionutbedregeanu

@ionutbedregeanu Which device did you run on? On fenix on a Moto G5, we see a typical range of about 20ms and, since runs aren't usually on the extremes, the difference between two runs is usually less – 10-15ms. I don't have a good intuition for focus.

To debug, I looked at your results by graphing them (./perf-tools/analyze_durations.py --graph results_v1.txt) and I noticed an interesting pattern. Here's v1: v1

And v2: v2

v2's last ~10 runs take longer than the average run in the rest of v2 and in v1. I wonder if something happened during the run – a few ideas I can think of are, did the device's screen turn off? Did your computer go to sleep? Did some state change in the app or device (e.g. it opted into an experiment, started from a different home screen, or ended on a different app screen)? Another possibility is that you're seeing the bimodal behavior we see in fenix's VIEW start up https://github.com/mozilla-mobile/fenix/issues/22144 but since that only happens on the last few results, that seems unlikely.

In any case, the pattern of the results of v2 seems atypical – I wonder if you'd continue seeing this if you tried running it again a few times.

mcomella avatar Jun 07 '22 00:06 mcomella

Yeah, it seems that it was an isolated result. After another tests the results seems pretty much the same

ionutbedregeanu avatar Jun 07 '22 07:06 ionutbedregeanu

After some tests I notice a regression added by https://github.com/mozilla-mobile/focus-android/pull/6897, but more like a spike

Before 6897 pr:

==> results.txt <==
{'max': 1804.0,
 'mean': 1653.44,
 'median': 1644.0,
 'min': 1603.0,
 'replicate_count': 25,
 'replicates': [1804.0, 1635.0, 1645.0, 1644.0, 1638.0, 1634.0, 1617.0, 1621.0,
                1624.0, 1603.0, 1720.0, 1670.0, 1671.0, 1668.0, 1627.0, 1692.0,
                1644.0, 1626.0, 1651.0, 1658.0, 1619.0, 1663.0, 1643.0, 1662.0,
                1657.0]}

On 6897 pr:

==> results.txt <==
{'max': 2128.0,
 'mean': 1950.48,
 'median': 2041.0,
 'min': 1619.0,
 'replicate_count': 25,
 'replicates': [1760.0, 1654.0, 1619.0, 1648.0, 1622.0, 1771.0, 1860.0, 1952.0,
                1967.0, 1993.0, 2055.0, 2027.0, 2033.0, 2063.0, 2128.0, 2061.0,
                2070.0, 2048.0, 2069.0, 2054.0, 2064.0, 2077.0, 2041.0, 2057.0,
                2069.0]}

After 6897 pr:

==> results.txt <==
{'max': 1840.0,
 'mean': 1658.08,
 'median': 1643.0,
 'min': 1622.0,
 'replicate_count': 25,
 'replicates': [1840.0, 1639.0, 1642.0, 1629.0, 1643.0, 1657.0, 1622.0, 1633.0,
                1663.0, 1638.0, 1643.0, 1659.0, 1638.0, 1644.0, 1643.0, 1653.0,
                1686.0, 1667.0, 1687.0, 1643.0, 1652.0, 1662.0, 1699.0, 1636.0,
                1634.0]}

Also after two runs I noticed a regression added by https://github.com/mozilla-mobile/focus-android/pull/6911 Device: Pixel 3a Android 11, fresh install every time

==> results.txt <==
{'max': 2078.0,
 'mean': 1998.48,
 'median': 2047.0,
 'min': 1623.0,
 'replicate_count': 25,
 'replicates': [1830.0, 1656.0, 1623.0, 2024.0, 2047.0, 1999.0, 2017.0, 1936.0,
                1955.0, 2058.0, 2061.0, 2030.0, 2062.0, 2046.0, 2060.0, 2060.0,
                2067.0, 2058.0, 2078.0, 2074.0, 2034.0, 2057.0, 2051.0, 2025.0,
                2054.0]}

ionutbedregeanu avatar Jun 07 '22 12:06 ionutbedregeanu

I made another session of testing and git bisect in AC to find the main cause of regression mentioned above(https://github.com/mozilla-mobile/focus-android/pull/6911) and it results that this commit added the regression Command: python3 measure_start_up.py -c=50 --product=focus nightly cold_view_nav_start --force results.txt Device: Pixel 3a Android 11 Build: Focus Nightly(fresh build on every session of testing) Unfortunately I don't have a device with an android version < 11 and I can't set the permissions to capture the profile using the Debug API I will unassign this from me, maybe someone else can help

ionutbedregeanu avatar Jun 08 '22 16:06 ionutbedregeanu

The Firefox Profiler can capture start up from the time Gecko starts and onward: see these instructions. Since the regression comes from a GeckoView upgrade, there is a good chance we'd be able to capture the regression in that profile (unless it occurs while GeckoView is starting rather than afterwards).

mcomella avatar Jun 08 '22 18:06 mcomella

@mcomella I followed the above instructions, but the comparator didn't find any differences between them, so probably the difference is while GeckoView is starting.

Tested scenario

  1. Follow Manual configuration steps
  2. Install a fresh build of Focus Nightly
  3. Go to app home screen(now the device is available in Mozilla Profiler)
  4. Start Recording from Mozilla Profiler
  5. Navigate to wikipedia
  6. Stop recording

Public profile before https://github.com/mozilla-mobile/focus-android/pull/6911

https://share.firefox.dev/3HaQzIq

Public profile after

https://share.firefox.dev/3H6swKm

Differences between profiles

https://profiler.firefox.com/compare/network-chart/?globalTrackOrder=340w2&hiddenGlobalTracks=0134&hiddenLocalTracksByPid=18342%20from%20profile%201-0~19652%20from%20profile%202-0&localTrackOrderByPid=18342%20from%20profile%201-0~19652%20from%20profile%202-0&profiles[]=https%3A%2F%2Fshare.firefox.dev%2F3HaQzIq&profiles[]=https%3A%2F%2Fshare.firefox.dev%2F3H6swKm&thread=2&v=6

ionutbedregeanu avatar Jun 09 '22 10:06 ionutbedregeanu

@ionutbedregeanu Our nightly runs only test cold start ups but this profile doesn't appear to be of a cold start up. In that case, you'd see the main thread maxed out for a few seconds at the start of the profile, like this profile: https://share.firefox.dev/3MBwwDY Make sure to force stop the app (in the Android UI or through adb) before launching.

Also, I noticed the profile was taken on a Pixel 3 – this is a pretty fast device so you'll probably have difficulty seeing a small difference like 28ms from the Moto G5 which our nightly tests run on. For context, 28ms is close to the smallest regression we can detect on that device. It would be preferable to take these profiles on a low-end device. fwiw, we want to change our reference device to the Galaxy A51, so you can use that if you happen to have one of those.

mcomella avatar Jun 09 '22 17:06 mcomella

I already talked with the team to transfer me a device with a lower android version, so I will postpone this ticket until then. Or as I said above, if someone else can help feel free to take the ticket

ionutbedregeanu avatar Jun 10 '22 06:06 ionutbedregeanu

Unfortunately I don't have a device with an android version < 11 and I can't set the permissions to capture the profile using the Debug API

You don't need to prioritize this work but FYI I added instructions for taking profiles on Android 11: https://wiki.mozilla.org/Performance/Fenix/Debug_API#Usage_for_start_up:_Android_11_and_later

mcomella avatar Jun 28 '22 21:06 mcomella

I built the startup trace for both scenarios, before and after https://github.com/mozilla-mobile/android-components/pull/12027 This is the report for both traces https://profiler.firefox.com/compare/calltree/?globalTrackOrder=0w2&profiles[]=https%3A%2F%2Fshare.firefox.dev%2F3uuMee0&profiles[]=https%3A%2F%2Fshare.firefox.dev%2F3RiXTGs&thread=0&v=6 But I don't fully understand the legend and how the report should be read to see the differences @mcomella Can you help me with this?

ionutbedregeanu avatar Jul 08 '22 07:07 ionutbedregeanu

I find the Profiler Compare View to be complex to navigate and I've only rarely found it more intuitive than by comparing two profiles by switching quickly between each profile in separate tabs. I set the profiles to the "Stack Chart" view and make sure they're set to a comparable range (e.g. pick a marker to end the range on). I might recommend trying that if the Profiler Compare is unintuitive.

For the Profiler Compare View, it looks like the link provided is comparing the same two profiles (I determined this by clicking each "main" track in the "Stack Chart" view and seeing that it didn't change). It might make more sense when you load in two separate profiles.

fwiw, @Amejia481 is also familiar with at least the first method if you have more questions.

I apologize for not responding sooner – I just saw this now.

mcomella avatar Jul 12 '22 23:07 mcomella

(e.g. it opted into an experiment, started from a different home screen, or ended on a different app screen)

@mcomella not sure if the tests are taking into consideration that focus added a splash screen that is delaying startups by up to 2 seconds to be able to download experiments and run them when the app starts.

Should we disabled experiments for this tests (also in Fenix)? I'm not sure if they are really helping us to measure a real startup and could cause noise while testing?

Amejia481 avatar Jul 13 '22 13:07 Amejia481

I tested this on a Samsung S7 with Android 8.0 with Power saving mode activated. The tests were done using at least 100 runs because there was a lot of variation between results and 25 or 50 runs would not give consistent/reliable mean/median values.

Following the steps from https://github.com/mozilla-mobile/perf-tools I tested with using

python3 measure_start_up.py -c=100 --product=focus nightly cold_view_nav_start

Based on the reported regression time range I focused my tests on Focus on the following PRs merged between April 20-22.

Results on Focus:

The following were used to have a good baseline:

Merge day Revision # Title # runs Max Mean Median Min Good Graph
18 a8683cc Update to AC 101.0.20220418190127 100 2013 1808 1818 1562 link
19 463ed77 Import strings from android-l10n 100 2017 1800 1789 1568 link
19 6eaa634 Update to AC 101.0.20220419190137 100 1984 1775 1791 1524 link
20 9727268 Update to AC 101.0.20220420143047 100 2552 1807 1808 1537 link

The next ones are in merge order

Merge day Revision # Title # runs Max Mean Median Min Good Graph
21 3cd24bc Update to AC 101.0.20220421143106 100 1997 1800 1809 1561 link
21 e522505 Update to AC 101.0.20220421190113 150 2191 1881 1877 1564 link
21 b68eb20 Update to AC 101.0.20220423143138 100 2189 1870 1848 1585 link

  The e5225054 changeset which updates the A-C version is the same @ionutbedregeanu found to have introduced the regression and the To note that that PR after which the regression in seen on Focus was opened on Apr 21, 2022, 10:20 PM GMT+3 so the GV update proposed as introducing the regression in A-S in https://github.com/mozilla-mobile/focus-android/issues/6923#issuecomment-1150124016 seems unlikely as there was another GV update on Apr 20 and even another GV update merged on Apr 21, 2022, 5:01 PM GMT+3.

I then moved to test on AC the PRS starting from Apr 20 but found the results to be inconclusive and so expanded the testing to cover Apr 19 also with this AC list of merges and got the following data:

Results on A-C:

Merge day Revision # Title # runs Max Mean Median Min Good Graph
19 8905464 Update suggestions everytime SuggestionFetcher updates 100 2250 1882 1872 1528 ? link
19 e07544f Update to GV 101.0.20220419093010 100 2104 1846 1871 1508 ? link
20 01012c1 Import strings from android-l10n 100 2197 1813 1830 1540 ? link
20 4687fa5 Update to GV 101.0.20220420094247 100 2146 1849 1857 1585 ? link
20 d8cc5fb Webcompat interventions 100 2040 1872 1868 1613 ? link
20 1d0457a Infer mime type 100 2230 1813 1845 1539 ? link
20 be4dbf1 Added features to support A-S 100 2277 1834 1856 1539 ? link
21 3d66c19 Update to GV 101.0.20220421094346 100 2098 1842 1854 1577 ? link
22 af3eaca Support auto hiding edit action end 100 2283 1885 1877 1554 ? link
22 88156d1 Update to GV 101.0.20220422100351 100 2132 1839 1842 1564 ? link

 

TheA-C tests were done using https://github.com/mozilla-mobile/focus-android/commit/3cd24bc882d6d77f5b2ff8059535697008636136 as a base in Focus and involved executing clean in A-C followed by a rebuild of the Focus project. The above results on A-C seem inconclusive to me though. I'll check with Ionut and maybe try testing on another device. Note that in my tests I didn't see the new splashscreen, as expected as it was merged in May.

All my tests can be seen here.

Mugurell avatar Aug 05 '22 08:08 Mugurell

Saw that a similar regression occured on Fenix also which would confirm the issue stems from AS and potentially GV.

| image | image |

Spoke with @ionutbedregeanu about the next steps and concluded on me testing on AC on another personal device I have a Nokia 6.1 with Android 10.

Mugurell avatar Aug 05 '22 09:08 Mugurell

Started testing again this AC list of merges on a Nokia 6.1 with Android 10 and got the following data:

Results on A-C:

Merge day Revision # Title # runs Max Mean Median Min Good Graph
18 f12e657 Import strings from android-l10n 100 4813 4713 4713 4625 ? link
18 9168dff Update to GV 101.0.20220418091627 25 4802 4731 4737 4665 ? link
18 ac93f1f Update compose to 1.1.1 and Kotlin to 1.6.10 50 4830 4738 4738 4768 ? link
19 890546 Update suggestions everytime SuggestionFetcher updates 50 4938 4820 4823 4697 ? link
...
22 88156d1 Update to GV 101.0.20220422100351 50 4907 4768 4757 4675 ? link

Which despite showing a lower variation between min and max it is still too high to allow for confidently identifying which changeset introduced the regression. Results can be seen in the same folder as the ones from above - https://drive.google.com/drive/u/1/folders/1CQ9MhWZUF4WEzJaGgW_WPolBGfkR3JM1

I can try testing using a Pixel 3 with Android 11 (same as Ionut) in the hope that although the times will be lower there will be less noise and a regression can be identified.

Mugurell avatar Aug 05 '22 15:08 Mugurell

Seeing that bisecting commits may give false results because of differences between runs higher than the reported regression I tested a block of commits around April 21st from this AC list of merges on a Pixel 3 with Android 11 and got the following data:

Results on A-C:

Merge day Revision # Title # runs Max Mean Median Min Good Graph
19 e07544f Update to GV 101.0.20220419093010 25 876 640 613 592 link
20 01012c1 Import strings from android-l10n 25 874 657 623 597 link
20 4687fa5 Update to GV 101.0.20220420094247 30 903 645 622 609 link
20 d8cc5fb Webcompat interventions 100 1011 660 630 600 link
20 1d0457a Infer mime type 30 909 640 612 590 link
20 be4dbf1 Added features to support A-S 30 946 668 629 610 link
21 f0ebd29 Import strings from android-l10n 100 898 626 608 588 link
21 3d66c19 Update to GV 101.0.20220421094346 100 1292 730 722 590 link
21 6d4ff98 Disable auto report on the AndroidManifest 100 1360 762 717 590 link
...
22 88156d1 Update to GV 101.0.20220422100351 25 770 717 709 691 link

  Seems like the results are more consistent on this device and they indicate https://github.com/mozilla-mobile/android-components/pull/12043 introduced the regression.

Mugurell avatar Aug 09 '22 07:08 Mugurell

@mcomella Tried profiling the startup with using the steps from https://github.com/mozilla-mobile/focus-android/issues/6923#issuecomment-1169238181 but by starting the app with measure_start_up.py -c=1 --product=focus nightly cold_view_nav_start as an easy way to also measure the time needed since I suspected the inherent startup differences may be higher than the searched for regression.

And indeed I saw any numbers between 620ms and 1292ms so any taken profiles would not allow the confidence needed to confirm a regression in GV.

Wanted to confirm with you that at this point the next steps for investigating this and trusting the results would be using measure_start_up.py --product=geckoview nightly cold_view_nav_start (the script would need minor modifications). I don't have now the setup to build GeckoView locally so I would report the regression to Bugzilla and leave the team there continue the investigation.

Mugurell avatar Aug 09 '22 14:08 Mugurell

@Mugurell From your previous comment https://github.com/mozilla-mobile/focus-android/issues/6923#issuecomment-1209002708, we can conclude the regression occurred due to a GV update. As you point out, the easiest way to find the root cause of the regression would be to bisect the commits to GeckoView (either by changing the version of GV in fenix or with GVE). Unfortunately, the tools don't make this easy currently. The second easiest method is usually to take profiles.

I don't have now the setup to build GeckoView locally so I would report the regression to Bugzilla and leave the team there continue the investigation.

This makes sense to me but it depends on how the GV team and the fenix team agree to handle regressions in GV. If you file an issue in Bugzilla, please add the "Performance: ?" flag.

And indeed I saw any numbers between 620ms and 1292ms so any taken profiles would not allow the confidence needed to confirm a regression in GV.

You generally want to compare a profile before the regression and after the regression. If there is too much variance in duration such that they're incomparable, one technique I use is to take several profiles for each scenario (5? More?), take the shortest for each (e.g. I often set the track to the first doOnPreDraw marker and compare the length at the top left), and compare those. While this isn't perfect, you're generally getting the run with minimal noise for each scenario which is more comparable than two runs with arbitrary amounts of noise.

mcomella avatar Aug 12 '22 21:08 mcomella

Sorry for my slow update here, my trusty Pixel 3 had to be sent for repair.


Based on the Focus data from https://github.com/mozilla-mobile/focus-android/issues/6923#issuecomment-1206192477 I followed the steps from https://profiler.firefox.com/docs/#/./guide-remote-profiling and https://profiler.firefox.com/docs/#/./guide-startup-shutdown?id=firefox-for-android to profile the startup of official Focus Nightly builds (arm64-v8a) on a Pixel 3 with Android 12.

At a quick glance the data is not consistent enough, that's why I've taken multiple profiles following the same steps:

  • with "Remote debugging via USB/Wi-Fi" disabled
  • with the latest Release (107.0) of Firefox for Mac installed and about:debugging opened
  • adb shell am start-activity -d "https://www.mozilla.org/" -a android.intent.action.VIEW org.mozilla.focus.nightly
  • In the newly opened app I then enabled "Remote debugging via USB/Wi-Fi"
  • connected to the app in about:debugging on Desktop
  • clicked "Start recording" -> "Capture profile"

Results:

for the "good" build for the "bad" build
https://share.firefox.dev/3tSmGXy https://share.firefox.dev/3ET2c76
https://share.firefox.dev/3UX238g https://share.firefox.dev/3GyDfiF
https://share.firefox.dev/3Ov5eBz https://share.firefox.dev/3TTa5hj
https://share.firefox.dev/3tXOLMN https://share.firefox.dev/3U078LU
https://share.firefox.dev/3tPzIoH https://share.firefox.dev/3OqVUin

@mcomella @MarcLeclair Please advise.

Mugurell avatar Nov 22 '22 16:11 Mugurell

Moved to bugzilla: https://bugzilla.mozilla.org/show_bug.cgi?id=1802642

Change performed by the Move to Bugzilla add-on.

cpeterso avatar Nov 26 '22 05:11 cpeterso