hyperswitch-web icon indicating copy to clipboard operation
hyperswitch-web copied to clipboard

[RCA Incident]: 4xx Missing address.state error

Open akash-c-k opened this issue 11 months ago • 0 comments

RCA Discussion Thread

https://juspay.slack.com/archives/C050U62NJG2/p1707538445249779 https://juspay.slack.com/archives/C05AU3N35FT/p1707730696417039

Detection Method

Message Alert

Incident Triggered By

Implementation bug

Executive Summary

Multiple alerts were triggered for 400 status code in /confirm responses, with error message “Missing required param: address.state”. Upon adding and analyzing the logs, we found that the dynamic fields were not rendered, mostly on iPhone devices, intermittently. This was not reproducible while testing on MacOS browsers, and only on testing with an iPhone simulator.

Timeline of The Incident

Timestamp what happened
Sat 10th Feb, 09:44 AM Got a 4xx alert on slack
Sat 10th Feb, 13:42 PM Impact Analysis - mostly occurring intermittently with iPhones
Sun 11th Feb, 19:31 PM Sev-1 call started
Sun 11th Feb, 22:28 PM Additional logs were added to debug and deployed in production
Tue 13th Feb, 18:07 – 20:10 PM Fix to handle the intermittent dynamic field render issue, release staggered from 0 - 100%

IMPACT - Technical

SDK did not render dynamic fields intermittently (mostly on iPhone devices) Payment Confirm requests were failing with the error “Missing required param: address.state”

IMPACT - Business

122 payments were affected overall since dynamic fields were introduced (~4% of total payments)

1st Why?

Why did SDK get 400 "Missing required param: address.state" in /confirm Billing object was being sent as null in the confirm request

2nd Why?

Why did the SDK not send the address state? Dynamic fields, which were responsible for the billing fields were not rendered and hence not being populated

3rd Why?

Why were the address input fields not rendered intermittently? Due to a race condition between a useEffect0 and an event handler in the same file, which both modified a recoilState value (responsible for setting the paymentMethods response).

4th Why?

Why was there a race condition? The useEffect0 was setting the recoilState to semiLoaded state after the event handler had set it to Loaded(Js.Json.t) state. Lost Update: In this Lost Update scenario, two operations concurrently read the same data and then updated it independently. However, due to the lack of proper synchronization or isolation mechanisms, one of the updates gets overwritten by the other, leading to the loss of one of the updates.

5th Why?

Why was this happening intermittently only on iPhone browsers? Ambiguous - behaviour of event management varies across browsers and across devices. iPhones seemed to be the most commonly affected device.

Mitigation

  • Addition of logs to debug the issue better with more data
  • Fixes to handle dependencies in dynamic fields rendering (did not work)
  • Fixes to avoid the race condition between the useEffect and the event handler to setRecoilState in a synchronous manner.

Action Items

  • [ ] #258

Lessons learnt

  • Issue must have been caught in PR reviews - two operations modifying a common state that could potentially cause race conditions due to async nature.
  • Test critical flows across devices.
  • Repeat the same test cases multiple times to rule out intermittent issues

akash-c-k avatar Mar 05 '24 07:03 akash-c-k