JS alert completion handler crashes and API updates
This is a single ticket to capture several related action items related to crashes from JS alert completion handlers that we’ve been seeing in production since at least v127 (and likely before).
Context/Background
For important context please see:
https://mozilla-hub.atlassian.net/browse/FXIOS-10315
https://mozilla-hub.atlassian.net/browse/FXIOS-10326
and also related Slack threads:
https://mozilla.slack.com/archives/C05C9RET70F/p1729096888361509
and 10/18 incident channel: #incident-ios-crash-increase
Simplified overview
- Known test cases for replicating the JS crash alerts are listed below, however there are likely also still crash scenarios we do not have repro steps for
- Additionally it was discovered that our
MessageAlertcompletion handlers were being called incorrectly, per WebKit docs - 3 of the known crashes / test cases were fixed in 131.1-3, however those changes have been reverted as part of the 10/18 incident, as we discovered the crash rate in 131.3 actually worsened
- As of 10/18, we are now back to where we started on
main:- We are calling MessageAlert’s completion handler in the wrong place (it needs to be called when the alert is dismissed, not after presenting)
- We have 6 (*TBD) test cases for replicating related crashes in production.
Action items
In order to try to reduce confusion, all of the above related issues with our JS alerts will be tracked here. The changes needed are:
- Fix our incorrect use of
MessageAlert(see notes above), by ensuring thecompletionHandleris called only after the alert is dismissed, not as soon as it’s presented - Address the crashes
- Sub-action item: move the JS code for test cases 4-6 from ww3schools to a more convenient test page
- Identify (via Sentry breadcrumbs, logs) what the JS alert crash scenario was that led to a crash increase in 131.3
- Investigate removing JS alert dequeue which is called from BVC
viewDidLayoutSubviews - This UI func is not a good place to be performing an alert dequeue, it’s currently unclear why it was added there
Caution needed ❗
We have implemented QA-validated fixes for these crashes in the past, and counter-intuitively this led to an increase in related crashes in Production. Unfortunately it seems that this code is extremely brittle, and so we need to use care as part of this ticket to ensure we don’t once again cause a dip in crash-free sessions.
Known test cases / crashes
TEST CASE 1
Steps logged in this Bugzilla (below) produce a crash in the JS alert:
- Visit https://mattreaganmozilla.github.io/test
- Click ‘Broken’ button
TEST CASE 2
- Visit https://mattreaganmozilla.github.io/test
- Tap on ‘Two alerts with delay’
- Have another unrelated alert (UIAlertController) presented
- Wait for the delay to pass, and the JS alerts to be enqueued
- Rotate the device
Result: the rotation triggers a viewDidLayoutSubviews which will actually dequeue an alert and leads to a crash since we’re not checking whether we can present the alert safely.
https://bugzilla.mozilla.org/show_bug.cgi?id=1922283
TEST CASE 3
- Visit https://mattreaganmozilla.github.io/test
- Tap (or try tapping twice, if needed) on ‘Two alerts with delay’
- Switch to another tab, wait 5 seconds, then switch back
(Result: crash, note that this specific crash may only repro in 131.1 & 131.2 but we need to regression test this scenario anyway in any builds from 132+)
TEST CASES 4-6
NOTE: These test cases are running the JS through ww3 schools, as one of the action items I am planning to move these scripts to the test page at https://mattreaganmozilla.github.io/test for convenience, will update the repro steps as needed once that is done.
Please note that we do not reproduce this issue if we run the alerts if opening w3schools > access tab tray (wait for the alert to be displayed) > Close the w3school tab > no crashes.
We are still reproducing this issue following the 3 cases below: Devices: iPhone 14 Pro (16.2), iPhone 15+ (18), iPhone 15 Pro (17.5), iPhone 15 (17.6).
- Alert
{{
The Window Object
The alert() Method
Click the button to display an alert box after 5 seconds.
}}Steps:
- Open https://www.w3schools.com/html/tryit.asp?filename=tryhtml_default_default
- Copy and Paste the Alert code and then press run
- Scroll down and tap on Try it
- Fast open the tabs tray and open a new tab (can be the homepage)
- Go back to the tabs tray.
- Tap on the
xbutton and close the w3schools website first and then the remaining homepage tab.
Actual results: FF crashes when you first close the w3schools tab, or when closing the 2nd opened tab (even if it was a homepage).
- Confirm message
<!DOCTYPE html> <html> <body> <h1>The Window Object</h1> <h2>The alert() Method</h2> <p>Click the button to display an alert box after 5 seconds.</p> <button onclick="myFunction()">Try it</button> <script> function myFunction() { // Delay the alert for 5 seconds (5000 milliseconds) setTimeout(function() { confirm("this is a confirm message"); }, 2000); } </script> </body>
Steps:
- Open https://www.w3schools.com/html/tryit.asp?filename=tryhtml_default_default
- Copy and Paste the Alert code and then press run
- Scroll down and tap on Try it
- Fast open the tabs tray and open a new tab (can be the homepage)
- Go back to the tabs tray.
- Tap on the
xbutton and close the w3schools website first and then the remaining homepage tab.
Actual results: FF crashes when you first close the w3schools tab, or when closing the 2nd opened tab (even if it was a homepage).
- Prompt
<!DOCTYPE html> <html> <body> <h1>The Window Object</h1> <h2>The alert() Method</h2> <p>Click the button to display an alert box after 5 seconds.</p> <button onclick="myFunction()">Try it</button> <script> function myFunction() { // Delay the alert for 5 seconds (5000 milliseconds) setTimeout(function() { prompt("test prompt"); }, 2000); } </script> </body>
Steps:
- Open https://www.w3schools.com/html/tryit.asp?filename=tryhtml_default_default
- Copy and Paste the Alert code and then press run
- Scroll down and tap on Try it
- Fast open the tabs tray and open a new tab (can be the homepage)
- Go back to the tabs tray.
- Tap on the
xbutton and close the w3schools website first and then the remaining homepage tab.
Actual results: FF crashes when you first close the w3schools tab, or when closing the 2nd opened tab (even if it was a homepage).
Added the crash logs.
┆Issue is synchronized with this Jira Bug
➤ Matt Reagan commented:
Follow-up note: fixes were previously implemented for test cases 1-3 (includes Bugzilla) as well as correcting the API misuse of the MessageAlert completion handler (see ticket description); these updates were originally merged across 131.1-3.
However, our crash rate in production went up, which means that some other crash scenario (which was occurring more frequently than the crashes we fixed) was actually exacerbated. The fixes were subsequently reverted in 131.4.
Orla mentioned during the 10/18 incident discussions that this area of the codebase had caused similar problems some time ago and she and Yoanna had been working in this area, and it was especially brittle. Orla Mitchell sending your way since based on the 10/18 chats it sounds like you may have specific ideas for how you’d like to implement these changes.
➤ Laurie Marceau commented:
Did a quick historical overview of the changes related to JS crashes. We’ve been trying to tackle those JS alert related issues since v110. Here’s the overview:
January 26 2023 - v110Added a showQueuedAlertIfAvailable() in viewDidLayoutSubviews()
https://github.com/mozilla-mobile/firefox-ios/pull/13044 ( https://github.com/mozilla-mobile/firefox-ios/pull/13044|smart-link )
February 21st 2023 - v110.1Revert previous code change as this caused a crash increase
https://github.com/mozilla-mobile/firefox-ios/pull/13297 ( https://github.com/mozilla-mobile/firefox-ios/pull/13297|smart-link )
February 23st 2023 - v112Call completion handler was not called from SceneDelegate. This PR added showQueuedAlertIfAvailable in viewDidLayoutSubviews and moved when the completion is called https://github.com/mozilla-mobile/firefox-ios/pull/13340 ( https://github.com/mozilla-mobile/firefox-ios/pull/13340|smart-link )
May 3rd 2023 - v114QR code use case. https://github.com/mozilla-mobile/firefox-ios/pull/14230 ( https://github.com/mozilla-mobile/firefox-ios/pull/14230|smart-link )
With Jira ticket https://mozilla-hub.atlassian.net/browse/FXIOS-6201 ( https://mozilla-hub.atlassian.net/browse/FXIOS-6201|smart-link )
July 18th 2023 - v117Confirmation prompt use case bugfix attempt https://github.com/mozilla-mobile/firefox-ios/pull/15629 ( https://github.com/mozilla-mobile/firefox-ios/pull/15629|smart-link )
With Jira ticket https://mozilla-hub.atlassian.net/browse/FXIOS-6678 ( https://mozilla-hub.atlassian.net/browse/FXIOS-6678|smart-link )
October 3rd-16th 2024 - v130PRs to revert some changes, i.e moving the completion handler to be called when the alert is dismissed, fixing some issues where the completion handler was called twice
https://github.com/mozilla-mobile/firefox-ios/pull/22392 ( https://github.com/mozilla-mobile/firefox-ios/pull/22392|smart-link )
https://github.com/mozilla-mobile/firefox-ios/pull/22602 ( https://github.com/mozilla-mobile/firefox-ios/pull/22602|smart-link )
October 3rd-16th 2024 - v130.1Reverted both previous PR to fix incident
https://github.com/mozilla-mobile/firefox-ios/pull/22645 ( https://github.com/mozilla-mobile/firefox-ios/pull/22645|smart-link )
https://github.com/mozilla-mobile/firefox-ios/pull/22641 ( https://github.com/mozilla-mobile/firefox-ios/pull/22641|smart-link )
In summaryWe’ve made various attempt at moving when the completion handler is called. As it was noted already, when showQueuedAlertIfAvailable() is called is part of the issue but was historically causing crashes beforehand as well. We never were in a good place with JS alert handling.
Path forwardI want to take some time to assess
- Competitors and how they handle JS alerts
- We don’t want another incident and to address the points in this ticket I would introduce a feature flag. This would give us an easy toggle if we need to revert changes in production. We know this area is fragile and we might be missing test use cases, so let’s handle it with care.
- There’s many points to address in this ticket, this will mean multiple PRs. I almost see this ticket as an epic to this point. Once we have a plan for a solution, I’ll layout some subtasks probably to easily keep track of the work.
➤ Isabella commented:
When the fix has been validated in production, there is a follow up task here (https://mozilla-hub.atlassian.net/browse/FXIOS-11031 ( https://mozilla-hub.atlassian.net/browse/FXIOS-11031|smart-link ) ) to investigate the mysterious JS Alert completion handling within the ShareSheetCoordinator.
➤ Andrei Bodea commented:
Hello, Isabella Laurie Marceau Matt Reagan Hello, anything for QA in order to verify this issue?
➤ Isabella commented:
cc Laurie Marceau Can you let Andrei Bodea know how and what to test? I think you needed to enable a feature flag? 🤔
➤ Laurie Marceau commented:
Andrei Bodea There’s 6 use cases listed in the description of this task. None of those should crash.
And like commented here ( https://mozilla-hub.atlassian.net/browse/FXIOS-10214?focusedCommentId=996605 ), the experiment should be enabled in the release build you are testing.