builder
builder copied to clipboard
Builder SDK - servers hang up until we restarted
What happened
During the incident, all page requests were timeout except for those pages cached by cloudflare. See below sceenshot.
Our investigations and findings
- There was a build in progress which will call Builder API to get page content but never ends. We killed from heroku command line. The incident happens ~10 minutes of start of the build
- Dyno memory usage looks normal
-
There wasn't traffic spikes
-
Calls to Builder API looks normal
Actually the outgoing calls to Builder stopped during the incident until we restarted the dynos:
For comparison, the traffic going out to our internal API increased during the incident (probably due to people refreshing the pages)
Thanks Guangyu , wondering why we still have metrics reading in response time graph
One more thing I noticed, is the request number to builder API gradually decreased.
On Sat, Jun 5, 2021 at 8:49 AM Guangyu Dong @.***> wrote:
Actually the outgoing calls to Builder stopped during the incident until we restarted the dynos:
[image: image] https://user-images.githubusercontent.com/43394294/120869895-58a00300-c54c-11eb-92a3-95e376c2bab5.png
For comparison, the traffic going out to our internal API increased during the incident (probably due to people refreshing the pages)
[image: image] https://user-images.githubusercontent.com/43394294/120869915-68b7e280-c54c-11eb-93f2-084e821da8cd.png
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BuilderIO/builder/issues/484#issuecomment-855073383, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOX2QMQBIIUC3V4VAWQZB5DTRFKBBANCNFSM46B3MTPQ .
thanks @gydongAP and @XianfuZhengAfterpay - this is helpful
the next thing we need to try and do is isolate what happened here. shy of having full access to your code and understanding how different states are handled, I am wondering if there is a way we can isolate a potential problem in the SDK
shy of seeing your code directly and knowing the nuances of how your stack handles certain situations I'm having trouble thinking of how to reproduce these findings in isolation - would it be possible for you to try and take a stab at this?
- Shahar
@Shahar Sharon @.***> since you are more familiar with the code base, do you mind providing some insights here?
On Tue, Jun 8, 2021 at 3:11 AM Steve Sewell @.***> wrote:
thanks @gydongAP https://github.com/gydongAP and @XianfuZhengAfterpay https://github.com/XianfuZhengAfterpay - this is helpful
the next thing we need to try and do is isolate what happened here. shy of having full access to your code and understanding how different states are handled, I am wondering if there is a way we can isolate a potential problem in the SDK
shy of seeing your code directly and knowing the nuances of how your stack handles certain situations I'm having trouble thinking of how to reproduce these findings in isolation - would it be possible for you to try and take a stab at this?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BuilderIO/builder/issues/484#issuecomment-856113155, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOX2QMV7GLU5RFCPV56YRDTTRT4VVANCNFSM46B3MTPQ .