hyperswitch-web
hyperswitch-web copied to clipboard
[RCA Incident]: 4xx Missing address.state error
RCA Discussion Thread
https://juspay.slack.com/archives/C050U62NJG2/p1707538445249779 https://juspay.slack.com/archives/C05AU3N35FT/p1707730696417039
Detection Method
Message Alert
Incident Triggered By
Implementation bug
Executive Summary
Multiple alerts were triggered for 400 status code in /confirm responses, with error message “Missing required param: address.state”. Upon adding and analyzing the logs, we found that the dynamic fields were not rendered, mostly on iPhone devices, intermittently. This was not reproducible while testing on MacOS browsers, and only on testing with an iPhone simulator.
Timeline of The Incident
Timestamp | what happened |
---|---|
Sat 10th Feb, 09:44 AM | Got a 4xx alert on slack |
Sat 10th Feb, 13:42 PM | Impact Analysis - mostly occurring intermittently with iPhones |
Sun 11th Feb, 19:31 PM | Sev-1 call started |
Sun 11th Feb, 22:28 PM | Additional logs were added to debug and deployed in production |
Tue 13th Feb, 18:07 – 20:10 PM | Fix to handle the intermittent dynamic field render issue, release staggered from 0 - 100% |
IMPACT - Technical
SDK did not render dynamic fields intermittently (mostly on iPhone devices) Payment Confirm requests were failing with the error “Missing required param: address.state”
IMPACT - Business
122 payments were affected overall since dynamic fields were introduced (~4% of total payments)
1st Why?
Why did SDK get 400 "Missing required param: address.state" in /confirm Billing object was being sent as null in the confirm request
2nd Why?
Why did the SDK not send the address state? Dynamic fields, which were responsible for the billing fields were not rendered and hence not being populated
3rd Why?
Why were the address input fields not rendered intermittently? Due to a race condition between a useEffect0 and an event handler in the same file, which both modified a recoilState value (responsible for setting the paymentMethods response).
4th Why?
Why was there a race condition? The useEffect0 was setting the recoilState to semiLoaded state after the event handler had set it to Loaded(Js.Json.t) state. Lost Update: In this Lost Update scenario, two operations concurrently read the same data and then updated it independently. However, due to the lack of proper synchronization or isolation mechanisms, one of the updates gets overwritten by the other, leading to the loss of one of the updates.
5th Why?
Why was this happening intermittently only on iPhone browsers? Ambiguous - behaviour of event management varies across browsers and across devices. iPhones seemed to be the most commonly affected device.
Mitigation
- Addition of logs to debug the issue better with more data
- Fixes to handle dependencies in dynamic fields rendering (did not work)
- Fixes to avoid the race condition between the useEffect and the event handler to setRecoilState in a synchronous manner.
Action Items
- [ ] #258
Lessons learnt
- Issue must have been caught in PR reviews - two operations modifying a common state that could potentially cause race conditions due to async nature.
- Test critical flows across devices.
- Repeat the same test cases multiple times to rule out intermittent issues